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Abstract 

Combat  target  identification  (CID)  is  the  process  by  which  detected  objects 
are  characterized  pursuant  to  military  action.  Errors  in  CID  such  as  mis-labeling 
targets  and  non-targets  carry  significant  costs.  Fusing  data  from  multiple  sources 
and  allowing  a  rejection,  or  non-declare,  option  can  improve  CID  error  rates. 

This  research  extends  a  mathematical  framework  that  selects  the  optimal 
sensor  ensemble  and  fusion  method  across  multiple  decision  thresholds  subject  to 
warfighter  constraints.  The  formulation  includes  treatment  of  exemplars  from  target 
classes  on  which  the  CID  system  classifiers  are  not  trained  (out-of- library  classes), 
and  it  enables  the  warfighter  to  optimize  a  CID  system  without  explicit  enumeration 
of  classifier  error  costs. 

A  time-series  classifier  design  methodology  is  developed  and  applied,  resulting 
in  a  multi-variate  Gaussian  hidden  Markov  model  (HMM)  with  a  specially  con¬ 
structed  hidden  state  space.  The  extended  CID  framework  is  used  to  compete  the 
HMM-based  CID  system  against  a  template-based  CID  system.  The  assessment 
uses  a  real  world  synthetic  aperture  radar  (SAR)  data  collection  comprised  of  ten 
in-library  target  classes  and  five  out-of-library  target  classes.  The  framework  evalu¬ 
ates  competing  classifier  systems  that  use  multiple  fusion  methods,  including  neural 
network  fusion  and  label  fusion,  varied  prior  probabilities  of  targets  and  non-targets, 
varied  correlation  between  multiple  sensor  looks,  and  varied  levels  of  target  pose 
estimation  error.  Also,  an  on-line  target  pose  estimator  is  developed  using  prin¬ 
cipal  component  analysis  of  masked  target  SAR  images.  This  estimator  validates 
experimental  assumptions  on  target  pose  prior  to  classification. 

The  CID  system  assessment  using  the  extended  framework  reveals  larger  fea¬ 
sible  operating  regions  for  the  HMM-based  classifier  across  experimental  settings. 
In  some  cases  the  HMM-based  classifier  yields  a  feasible  region  that  is  25%  of  the 
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threshold  operating  space  versus  1%  for  the  template-based  classifier.  Similar  perfor¬ 
mance  results  are  obtained  for  rule-based  label  fusion  and  the  more  complex  neural 
network  fusion  and  are  explained  by  the  new  ability  to  independently  set  classifier 
thresholds  with  the  label  fusion  method. 
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Combat  Identification  with  Sequential  Observations,  Rejection 
Option,  and  Out-of-Library  Targets 


1.  Introduction 

1.1  Background 

The  research  reported  in  this  dissertation  stems  from  a  study  of  pattern  recog¬ 
nition  applied  to  modern  warfare.  Two  thousand  years  ago  the  Chinese  military 
philosopher  Sun  Tzu  wrote,  “if  you  know  the  enemy  and  know  yourself,  you  need 
not  fear  the  results  of  one  hundred  battles  [1].”  Thus,  perfect  knowledge  of  your 
enemy,  his  assets  and  their  location  coupled  with  knowledge  of  your  own  assets,  lo¬ 
cations,  and  capabilities  provide  the  military  leader  an  undeniable  advantage  over 
his  adversary. 

United  States  Armed  Forces  doctrine,  and  US  Air  Force  (USAF)  doctrine  in 
particular,  have  made  the  ancient  truism  the  official  practice  of  the  US  military. 
USAF  doctrine  document  AFDD  2-1,  entitled  Air  Warfare,  relates  that  if  an  enemy’s 
key  targets  can  be  found  and  identified,  then  air  power  can  be  applied  [2],  Thus, 
identifying,  or  classifying,  a  target  is  a  critical  link  in  the  kill  chain  that  begins  with 
finding  a  target,  includes  engaging  the  target,  and  ends  with  assessing  the  outcome 
of  the  engagement.  The  US  military  defines  combat  identification  (CID)  as 

the  process  of  attaining  an  accurate  characterization  of  detected  objects 
in  the  joint  battlespace  to  the  extent  that  high  confidence,  timely  appli¬ 
cation  of  military  options  and  weapons  resources  can  occur  [3]. 

Figure  1  depicts  the  CID  problem  from  the  combat  pilot’s  perspective.  The  true 
nature  of  the  entities  sharing  the  battlespace  is  unknown.  Here  CID  characterizes 
those  entities  using  information  from  a  variety  of  sources.  The  goal  of  CID  is  to  max¬ 
imize  operational  effectiveness  by  neutralizing  the  enemy  with  an  efficient  allocation 
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Figure  1.  The  real  combat  identification  problem:  battlespace  characterization 
from  the  combat  pilot’s  perspective.  Figure  originally  presented  by  Mr. 
Charles  Sadowski,  ACC/DRSA  [4]. 

of  combat  resources  while  minimizing  friendly  casualties  [4] .  Friendly  casualties  may 
result  from  either  enemy  or  friendly  fire,  commonly  called  fratricide.  By  improving 
CID  performance,  friendly  casualties  are  reduced  on  both  fronts:  fewer  enemy  to 
engage  friendly  units,  and  fewer  mis-identified  friendly  units. 

Doctrinal  links  with  CID  can  be  found  in  joint  and  Air  Force  doctrine.  In 
Joint  Vision  2020  the  Chairman  of  the  Joint  Chiefs  of  Staff  provides  a  template  for 
the  transformation  of  the  US  Armed  Forces.  In  this  document  CID  impacts  three 
of  four  operational  concepts:  precision  engagement,  dominant  maneuver,  and  full 
dimensional  protection  [5]. 

The  US  Armed  Forces  recognize  the  principles  of  war  as  fundamental  guidance 
for  the  application  of  military  power.  They  are  listed  and  defined  in  Joint  Warfare 
of  the  Armed  Forces  of  the  United  States,  JP  1,  the  capstone  joint  warfare  doctrine 
document  [6].  Accurately  identifying  targets  in  a  timely  manner  supports  the  prin¬ 
ciples  of  offense,  economy  of  force,  and  surprise,  thus  affording  an  advantage  over  an 
adversary  without  a  similar  capability. 
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Among  the  seven  tenets  of  aerospace  power  which  complement  the  principles 
of  war  and  reflect  the  evolution  of  airpower,  Air  Force  doctrine  lists  decentralized 
execution  of  air  and  space  power.  Decentralized  execution  is 

the  delegation  of  execution  authority  to  responsible  and  capable  lower- 
level  commanders  to  achieve  effective  span  of  control  and  to  foster  dis¬ 
ciplined  initiative,  situational  responsiveness,  and  tactical  flexibility.  It 
allows  subordinates  to  exploit  opportunities  in  rapidly  changing,  fluid 
situations.  [7] 

Improved  target  recognition  systems  allow  operators  to  respond  quickly  and  provide 
greater  flexibility  in  their  responsiveness. 

Air  Force  Basic  Doctrine,  AFDD  1,  lists  six  distinctive  capabilities,  or  areas  of 
expertise,  of  the  Air  Force.  Of  these  six  distinctive  capabilities,  Global  Attack  and 
Precision  Engagement  are  directly  impacted  by  improvements  to  target  recognition 
systems.  Global  Attack  refers  to  the  “ability  of  the  Air  Force  to  attack  rapidly  and 
persistently  with  a  wide  range  of  munitions  anywhere  on  the  globe  at  any  time  [7] 
Precision  Engagement  refers  to  air  and  space  power’s  ability  “to  apply  discriminate 
force  precisely  where  required  [7].” 

At  a  more  detailed  level  of  airpower  application,  AFDD  1  lists  seventeen  key 
operational  functions  of  the  Air  Force.  Of  those  listed,  improved  target  recognition 
systems  positively  impact  the  following  functions: 

•  Strategic  Attack,  defined  as  offensive  action  conducted  by  command 
authorities  aimed  at  generating  effects  that  most  directly  achieve 
national  security  objectives  by  affecting  the  adversary’s  leadership, 
conflict-sustaining  resources,  and  strategy 

•  Counterair,  defined  as  operations  that  attain  and  maintain  a  de¬ 
sired  degree  of  air  superiority  by  the  destruction,  degradation,  or 
disruption  of  enemy  forces 

•  Counterland,  defined  as  air  and  space  operations  against  enemy  land 
force  capabilities  to  create  effects  that  achieve  JFC  (Joint  Forces 
Commander)  objectives 

•  Countersea ,  defined  as  functions  that  extend  Air  Force  capabilities 
into  a  maritime  environment 
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•  Surveillance  and  Reconnaissance,  defined  as  systematically  observ¬ 
ing  air,  space,  surface,  or  subsurface  areas,  places,  persons,  or  things, 
by  visual,  aural,  electronic,  photographic,  or  other  means  . . .  designed 
to  provide  warning  of  enemy  initiatives  and  threats  and  to  detect 
changes  in  enemy  activities  [7] 

The  last  function  listed  above,  Surveillance  and  Reconnaissance,  is  covered 
more  fully  in  two  top-level  Air  Force  doctrine  documents:  AFDD  2-5.2,  Intelligence , 
Surveillance,  and  Reconnaissance  Operations  [8],  and  AFPAM  14-210,  United  States 
Air  Force  Targeting  Guide  [9],  where  AFDD  2-5.2  outlines  the  principles  and  doctrine 
for  intelligence,  surveillance,  and  reconnaissance  (ISR),  and  AFPAM  14-210  explains 
the  principles  and  concepts  of  targeting,  a  core  Air  Force  discipline  which  integrates 
intelligence  information  about  targets  with  operational  information  about  friendly 
objectives,  capabilities,  and  doctrine. 

Both  documents  describe  the  process  of  information  fusion.  The  ISR-derived 
information  from  many  sources  is  combined,  evaluated,  and  analyzed  in  a  process 
called  fusion.  Fusion  is  listed  as  one  of  eleven  ISR  principles  in  AFDD  2-5.2  [8], 
and  AFPAM  14-210  defines  fusion  as  the  process  of  combining  multi-source  data 
into  intelligence  necessary  for  decision  making  and  highlights  fusion  as  a  guiding 
principle  in  the  targeting  process. 

While  identifying  and  defining  fusion  as  an  important  principle  in  intelligence 
gathering  and  processing,  neither  document  provides  guidance  for  carrying  out  multi¬ 
source  fusion.  Indeed,  intelligently  automating  the  fusion  of  information  from  mul¬ 
tiple  sources,  or  sensors,  would  improve  ISR  operations  by  making  more  accurate 
target  identifications  and  would  speed  the  targeting  timeline  by  lessening  reliance  on 
human  interpretation. 

The  Air  Force  places  great  emphasis  on  the  importance  of  recognizing  and 
fostering  technological  advances  in  order  to  improve  warfighting  capabilities.  The 
AFDD  1  describes  Technology-to-warfighting,  one  of  three  Air  Force  core  competen¬ 
cies,  as  follows: 
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As  a  leader  in  the  military  application  of  air,  space,  and  intelligence, 
surveillance,  and  reconnaissance  technology,  the  Air  Force  is  commit¬ 
ted  to  innovation  to  guide  research,  development,  and  fielding  of  unsur¬ 
passed  capabilities.  Just  as  the  advent  of  powered  flight  revolutionized 
joint  warfighting,  recent  advances  in  low  observable  technologies;  space- 
based  systems;  manipulation  of  information;  precision;  and  small,  smart 
weapons  offer  no  less  dramatic  advantages  for  combatant  commanders. 

The  Air  Force  nurtures  and  promotes  its  ability  to  translate  our  technol¬ 
ogy  into  operational  capability  to  prevail  in  conflict  and  avert  technolog¬ 
ical  surprise.  [7] 

Research  in  the  area  of  target  recognition  systems  fits  directly  under  the  umbrella 
of  this  core  competency  of  the  Air  Force  and  is  supported  by  Joint  and  Air  Force 
doctrine. 


1.2  Problem  Statement 

With  sound  doctrinal  support  for  research  in  the  area  of  CID  explained  in 
Section  1.1,  this  section  details  problems  addressed  by  this  dissertation.  A  notional 
CID  system  is  shown  in  Figure  2.  Observations  through  time  of  a  region  of  interest 
are  made  by  two  sensors,  si  and  s'2-  Sensor  data  D  is  processed  into  features  F 
which  are  then  classified  into  labels  L  before  being  fused  into  final  labels  Lfinai. 

S  D  C  f 

E  - ►  D  - ^ - ►  F  - ►  L  - ►  Lrinal 

sensor  processor  classifier  fusion 


Data  Features  Labels  Decision 


Figure  2.  Notional  CID  system  with  two  sensors  evaluating  observations  through 
time  t  =  T. 

Air  Force  doctrine  stipulates  that  the  targeting  process  must  gather  infor¬ 
mation  to  reach  a  desired  level  of  labeling  confidence  prior  to  making  a  shoot  deci- 
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sion  [8,  2],  Two  paths  to  improved  classifier  confidence  are  temporal  fusion,  or  fusion 
of  sequential  observations,  and  sensor  fusion,  or  fusion  across  sensors.  Both  fusion 
methods  attempt  to  improve  classification  performance  by  combining  information 
contained  in  multiple  observations.  With  temporal  fusion  the  classification  system 
processes  a  sequence  of  event  observations.  Observations  may  be  autocorrelated 
and  additional  observations  may  provide  information  beneficial  to  the  classification 
process,  or  they  may  confuse  the  classifier,  producing  undesirable  results. 

Fusion  of  multiple  sensors  is  considered  when  designing  multiple  classifier  sys¬ 
tems  (MCS).  The  architect  must  design  both  an  ensemble  of  classifiers  and  a  fusion 
rule  with  which  to  combine  the  individual  classifier  outputs.  The  MCS  performance 
depends  on  an  ensemble  whose  classifiers  make  disjoint  errors  (i.e.,  classifier  A  and 
classifier  B  errors  occur  in  non-overlapping  areas  of  the  feature  space),  and  a  fusion 
rule  which  takes  advantage  of  relative  strengths  of  the  constituent  classifiers  [10]. 

Given  a  CID  system,  the  warfighter  requires  a  label-space  that  is  less  rigid  than 
forced-decision  [4],  A  forced-decision  classifier  trained  to  recognize  objects  in  class 
A,  B,  C,  or  D  maps  every  test  record  into  one  of  four  possible  classes.  Warfighters 
require  that  a  reject  option  be  given  to  the  classifier  which  allows  it  to  opt  against 
the  forced-decision  label  and  for  a  “non- declaration”  label. 

Thus,  the  warfighter  requires  at  least  a  trichotomous  label  space  for  the  CID 
system.  Using  the  example  above,  data  class  A  is  labeled  “hostile” ,  data  classes  B, 
C,  and  D  are  labeled  “friend" ,  and  when  the  classifier  does  not  achieve  the  desired 
labeling  confidence  it  applies  the  third  label,  “non- declared” . 

Optimizing  classifiers  with  a  reject  option  has  been  studied  [11,  12,  13,  14],  but 
invariably  the  optimal  decision  boundaries  rely  on  a  set  cost  rule  for  classifier  errors. 
Laine’s  research  [15]  proposes  a  methodology  for  optimizing  a  rejection-capable  CID 
system  without  explicit  error  costs.  Thus  the  warfighter  does  not  specify  the  relative 
cost  of  a  fratricide  incident  versus  collateral  damage  versus  a  successful  engagement. 
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One  useful  extension  of  the  trichotomous  label-space  of  a  rejection-capable 
CID  system  is  the  incorporation  of  an  “out- of -library”  label.  A  CID  system  can  be 
thought  of  as  a  simple  classifier  trained  on  exemplars  from  a  specified  set  of  target 
classes.  The  union  of  the  target  classes  constitutes  the  library  of  the  classifier.  An 
exemplar  is  said  to  be  “ in-library ”  if  it  is  from  a  target  class  which  the  classifier 
has  been  trained  to  recognize,  and  it  is  “ out- of -library”  otherwise.  It  is  likely  that  a 
fielded  CID  system  will  encounter  targets  in  out-of- library  classes  [4], 

The  goals  of  this  research  include  the  development  of  a  robust,  time-series  MCS 
for  use  in  an  extended  CID  optimization  framework  that  includes  both  a  rejection 
option  and  in-library  and  out-of-library  discrimination.  In  addition,  the  effects  of 
data  correlation  in  a  temporally-fused  MCS,  data  prevalence,  and  extended  operating 
conditions  are  examined.  Also,  means  of  performance  assessment  are  developed. 

For  the  foreseeable  future  air  operations  will  require  timely  acquisition  of  and 
precise  engagement  against  targets  regardless  of  environmental  conditions  while  mini¬ 
mizing  collateral  damage.  This  research  focuses  on  the  sensor  processing  and  decision 
making  parts  of  the  kill  chain. 

1.3  Scope 

The  scope  of  research  for  this  dissertation  includes  the  methodology  used  to 
design  a  temporally-fused  MCS  in  a  CID  setting.  Much  attention  in  the  Department 
of  Defense  has  been  paid  to  the  development  of  a  family-of-systems  (FOS)  networked 
together  to  provide  a  joint  service  CID  capability  [16] .  Some  of  these  FOS  systems 
are  cooperative  identifiers,  such  as  transponders  which  identify  friendly  forces  by 
producing  a  certain  signal.  This  research  considers  non-cooperative  means  of  target 
classification. 

Specifically,  the  research  presented  here  uses  synthetic  aperture  radar  (SAR) 
imagery  of  ground  targets  collected  from  an  airborne  sensor.  The  imagery  has  been 
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pre-processed  to  present  the  researcher  with  a  detected  target  in  each  image.  Thus, 
the  focus  is  not  target  detection,  but  rather  target  classification. 

This  research  advances  the  field  of  pattern  recognition  by  developing  a  temporally- 
fused,  multiple  classifier  CID  system  using  hidden  Markov  models  (HMMs)  operat¬ 
ing  on  features  drawn  from  SAR  images  of  ground  targets  taken  at  various  aspect 
angles.  Sequencing  the  target  observations  by  aspect  angle  generates  a  temporal 
sensor-target  relationship.  A  real  world  application  is  classification  of  ground  tar¬ 
gets  by  an  airborne  sensor  in  a  multi- look,  or  sequence  of  observations,  setting  with 
an  unknown  relative  initial  aspect  angle  of  target  to  sensor. 

This  research  also  explores  the  impact  of  extended  operating  conditions  (EOCs) 
on  CID  systems.  Sensor  observation  of  a  specific  ground  target  presents  different 
signatures  depending  on  the  sensor-target  orientation,  target  class  variant,  target 
articulation,  and  the  surrounding  clutter  environment.  A  ground  vehicle  has  different 
signatures  when  observed  head-on  versus  from  its  flank.  Similarly,  a  turreted  target 
has  different  signatures  if  its  barrel  is  in-line  with  the  body  versus  rotated  askew  the 
body.  The  EOCs  are  real-world  considerations  which  degrade  classifier  performance 
due  to  variations  in  the  target.  Synthesized  data  typically  adds  white  noise  to  the 
signature  that  masks  the  target,  while  EOCs  present  targets  whose  signatures  vary 
from  the  in- library  exemplars. 

1.4  Approach 

The  first  step  of  the  research  process  is  a  review  of  the  pertinent  literature  and 
is  presented  in  Chapter  2.  Four  areas  are  reviewed: 

•  Hidden  Markov  models  as  time  series  classifiers  and  their  theory  and  applica¬ 
tion 

•  High  range-resolution  radar  signature  processing  and  its  use  in  target  recogni¬ 
tion 
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•  Model  selection  as  discussed  in  information  theory  literature 

•  Multiple  classifier  systems,  sensor  fusion,  and  other  CID  issues 

The  review  highlights  current  research  efforts,  develops  supporting  theory,  and  points 
to  areas  that  contribute  to  this  dissertation  research. 

Next,  the  temporally- fused  CID  system  is  designed  using  a  heuristic  method¬ 
ology  to  specify  the  model.  The  methodology  focuses  on  classifier  performance  while 
selecting  model  complexity  and  structure.  In  addition,  the  CID  system  incorporates 
an  in-library  versus  out-of-library  discriminator  designed  using  a  separate  heuristic 
methodology  focused  on  two-class  separability.  The  out-of-library  discriminator  is 
used  to  extend  Laine’s  CID  optimization  framework  [15]  by  including  both  an  “ out- 
of-library ”  label  and  the  associated  warfighter  constraint  used  in  optimizing  the  CID 
system. 

By  exploring  separate  design  methodologies,  the  research  finds  robust  archi¬ 
tectures  that  perform  well  in  an  EOC  setting,  where  the  in-library  target  data  is  sig¬ 
nificantly  different  from  the  in-library  training  data  and  may  include  out-of-library 
exemplars. 

1 . 5  Contributions 

Contributions  from  this  dissertation  research  are  in  the  following  areas: 

•  Development  of  an  HMM-based  time  series  classifier 

•  Extension  of  Laine’s  CID  optimization  framework  to  include  out-of-library  per¬ 
formance 

•  Development  of  an  out-of-library  classification  methodology 

•  Development  of  a  target  pose-estimation  methodology  using  principal  compo¬ 
nent  analysis 
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•  Application  of  the  extended  framework  to  a  multi-class  ATR  experiment  that 
competes  the  HMM-based  classifier  against  a  template-based  classifier 

•  Development  of  the  framework  to  allow  classifiers  to  make  reject,  or  not  declare, 
decisions,  to  test  classifiers  against  out-of-library  records,  and  to  measure  the 
performance  of  three  different  fusion  methods 

•  Development  of  evidence  for  independent  optimal  threshold  settings  for  label 
fusion 

A  comprehensive  review  of  the  literature  covers  the  theory  and  development 
of  hidden  Markov  models.  The  application  of  HMMs  to  ATR  problems  using  high 
range-resolution  radar  signatures  as  features  is  described  in  Sec.  2.1.3.10,  and  it 
reveals  limitations  in  treatment  of  prior  knowledge  of  target  aspect,  inclusion  of  a 
rejection  option,  and  performance  considering  out-of-library  targets.  Other  research 
areas  covered  in  the  literature  review  include  model  complexity  in  HMMs,  multiple 
classifier  fusion,  rejection  theory,  and  Laine’s  CID  optimization  framework. 

Chapter  3  describes  the  development  of  an  HMM-based  time  series  classifier. 
Ultimately,  the  methodology  results  in  a  multi-dimensional  Gaussian  HMM  operat¬ 
ing  on  HRR-derived  feature  data.  The  model  takes  as  input  a  sequence  of  feature 
data  ordered  by  target  aspect  angle.  The  model  establishes  a  relation  between  the 
observation  distribution  associated  with  each  hidden  state  and  the  signature  of  the 
target  within  a  range  of  aspect  angle. 

Chapter  4  extends  Laine’s  CID  optimization  framework  by  including  an  out- 
of-library  performance  measure.  The  framework  retains  the  desired  characteristic  of 
allowing  trade-off  analysis  without  explicit  classification  error  costs. 

Section  4. 3. 2. 5  describes  a  methodology  whereby  a  classifier  assigns  an  esti¬ 
mated  posterior  probability  of  out-of-library  class  membership  to  a  test  record.  This 
methodology  is  implemented  as  a  post-processing  step  after  the  classifier  trained  on 
in-library  classes  has  adjudicated  the  test  record.  The  methodology  produces  the 
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estimated  out-of-library  posterior  probability  as  a  function  of  the  in-library  class 
posterior  probabilities  produced  by  the  classifier. 

Section  5.4.5  develops  a  method  to  estimate  target  aspect  angle  based  on  a 
target  mask  of  a  SAR  image.  The  method  uses  principal  component  analysis  to  find 
the  major  axis  of  the  target  mask.  An  initial  experiment  finds  pose  estimation  error 
to  be  roughly  lf°. 

Chapter  5  details  the  application  of  the  extended  CID  framework  to  an  ATR 
experiment  using  DCS  radar  SAR  data.  The  experiment  competes  an  HMM-based 
system  (a  derivative  of  the  Chapter  3  system)  against  a  template-based  classifier. 
The  extended  framework  allows  the  systems  to  be  compared  inclusive  of  warfighter 
constraints,  rejection  option,  and  out-of-library  target  records.  Results  show  that  the 
HMM-based  system  provides  the  warfighter  with  better  and  more  robust  performance 
across  a  variety  of  experiment  settings,  including  fusion  rule,  hostile/friend  class 
prevalence,  observation  length,  and  prior  knowledge  of  target  aspect  angle.  Also,  the 
size  of  feasible  region  in  the  threshold  space  provides  a  simple  comparative  measure  of 
classifier  robustness,  and  performance  surfaces  efficiently  communicate  performance 
information  and  trade-space. 

Laine’s  research  [f  5]  has  shown  that  independent  thresholding  for  each  classifier 
prior  to  applying  a  label  fusion  rule  allows  improved  performance  over  the  application 
of  single  thresholding  after  the  fusion  of  classifier  outputs.  Section  5. 6. 2. 5  shows 
that  independent  thresholding  enables  each  classifier  to  use  optimal  thresholds  in 
different  locations  in  the  threshold  space.  This  added  flexibility  allows  the  label 
fusion  method  to  combine  a  classifier  whose  threshold  setting  allows  it  to  perform 
well  in  one  performance  measure,  but  poorly  elsewhere,  with  a  second  classifier  whose 
threshold  setting  allows  it  to  perform  well  in  another  performance  area. 
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1 . 6  Organization 


The  remainder  of  the  document  is  organized  as  follows: 

Chapter  2  provides  background  instruction,  describes  key  supporting  areas  of 
the  proposed  research,  and  provides  a  review  of  the  current  literature.  The  support¬ 
ing  research  areas  include:  hidden  Markov  models  (their  theory  and  application), 
high  range-resolution  radar  profiles  (their  processing  and  use  in  automatic  target 
recognition  (ATR)),  model  selection  and  model  complexity  as  discussed  in  informa¬ 
tion  theory,  and  the  design  of  multiple  classifier  systems  (or  sensor  fusion). 

Chapter  3  describes  a  heuristic  approach  to  the  design  and  development  of  an 
HMM-based  classifier.  First,  an  example  application  of  an  HMM-based  classifier  to 
sequences  of  genetic  data  is  given.  Next,  model  selection  theory  is  applied  in  the 
choice  of  HMM  design.  Finally,  a  series  of  refinements  to  the  HMM  design  are  made 
with  regard  to  assumptions  and  proven  performance. 

Chapter  4  describes  the  proposed  classifier  and  CID  framework  extension,  and 
the  proposed  HMM-based  classifier  is  presented  as  part  of  an  extended  CID  optimiza¬ 
tion  framework  which  includes  both  a  rejection  option  and  out-of- library  exemplars. 

Chapter  5  considers  application  to  DSC  data  and  a  competitor,  where  the 
proposed  classifier  is  compared  to  a  template-based  classifier  using  SAR  data  from 
a  2004  collection  within  the  extended  CID  framework. 

Chapter  6  presents  a  summary  of  findings,  discusses  research  contributions, 
and  proposes  future  research  areas  stemming  from  this  work. 
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2.  Background 

This  chapter  reviews  pertinent  literature,  provides  background  information,  and  is 
organized  by  research  area.  First,  hidden  Markov  models  as  time  series  classifiers  are 
introduced,  related  literature  is  reviewed,  and  supporting  theory  is  shown.  Second, 
high  range-resolution  radar  as  a  source  of  classification  features  is  covered.  Third, 
model  selection  in  the  context  of  information  theory  is  defined  and  related  literature 
is  reviewed.  Finally,  basic  concepts  and  taxonomies  of  sensor  fusion  and  multiple 
classifier  systems  are  covered. 

2.1  Hidden  Markov  models 

2.1.1  Introduction 

An  important  aspect  of  combat  identification  (CID)  is  the  incorporation  of  tem¬ 
poral  target  observations  into  the  classification  process.  In  his  dissertation  Fielding 
proves  that  given  a  sequence  of  observations  in  which  there  is  a  provable  dependency, 
the  entropy  of  the  joint  observations  is  less  than  the  entropy  of  the  individual  obser¬ 
vations  [17].  Thus,  a  classifier  operating  on  the  greater  source  of  information  (less 
entropy)  will  have  equal  or  greater  classification  power  than  single-look  methods. 

A  review  of  the  pattern  recognition  literature  in  search  of  time  series  classifiers, 
or  classifiers  which  incorporate  data  order,  yields  hidden  Markov  models  (HMMs) 
as  the  primary  classifier  for  the  research  presented  here.  In  the  following  sections 
hidden  Markov  models  are  introduced,  their  mathematical  development  is  given,  and 
HMM  applications  in  the  field  of  automatic  target  recognition  are  reviewed. 

2.1.2  Literature 

Hidden  Markov  models  fit  into  a  broad  class  of  statistical  signal  models  which 
also  includes  Gaussian  processes,  Poisson  processes  and  Markov  processes.  These 
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models  seek  to  characterize  a  signal  as  a  parametric  random  process  whose  param¬ 
eters  can  be  estimated  (or  determined)  in  a  well-defined  manner  [18].  HMMs  are 
statistical  representations  of  time  series  data.  An  HMM  is  used  to  represent  prob¬ 
ability  distributions  given  a  sequence,  or  many  sequences,  of  observations.  The 
fundamental  property  of  HMMs  is  the  assumption  that  the  sequence  of  observations 
is  a  noisy  function  of  a  Markov  chain  which  is  not  directly  observed,  or  hidden. 

The  literature  contains  several  HMM  tutorial  articles  and  texts.  Rabiner’s 
tutorial  on  HMMs  [18]  is  widely  cited  and  gives  an  introduction  to  HMMs  applies 
them  in  a  speech  recognition  application.  A  more  recent  article  on  HMMs  and  their 
development  is  in  Ghahramani’s  paper  [19].  A  small  section  introducing  HMMs  in 
Duda  and  Hart’s  classic  pattern  recognition  text  [20]  is  useful  for  its  diagrams. 

Two  texts  wholly  dedicated  to  HMMs  are  useful  in  researching  the  variety 
of  specialized  HMMs  and  their  applications.  Elliott’s  text  [21]  focuses  on  signal 
processing  applications  of  HMMs,  and  MacDonald’s  text  [22]  focuses  on  discrete¬ 
valued  time  series  applications.  Both  provide  excellent  mathematical  development 
for  HMMs. 

A  history  of  HMMs  begins  with  their  introduction  as  probabilistic  functions  of 
Markov  chains  in  Baum  and  Petrie’s  1966  paper  [23].  Later,  Baum,  Petrie,  Soules, 
and  Weiss  introduced  a  method  to  calculate  the  conditional  probability  of  a  state 
given  a  sequence  of  observations  [24],  In  the  same  paper,  they  showed  how  to  effi¬ 
ciently  estimate  the  parameters  of  an  HMM.  The  algorithm,  called  alternately  the 
Baum  algorithm,  the  Baum-Petrie  algorithm,  or  the  Baum- Welch  algorithm,  is  the 
expectation  maximization  algorithm  of  Dempster,  Laird,  and  Rubin  [25]  applied  to 
HMMs.  Local  convergence  of  the  algorithm  was  proved  [24],  and  later  work  [26,  27] 
proved  the  consistency  and  asymptotic  normality  of  the  maximum  likelihood  esti¬ 
mators  of  the  HMM  parameters. 

Ephraim  and  Merhav  provide  a  well-referenced  overview  of  HMMs  and  their 
applications  [28].  HMMs  have  been  applied  in  a  number  of  research  areas.  State 
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of  the  art  speech  recognition  engines  employ  HMMs  [18,  29,  30,  31]  to  match  spo¬ 
ken  word  with  stored  language.  The  vast  amounts  of  data  generated  in  efforts  to 
map  genetic  material  are  sorted  by  structure  and  purpose  in  an  area  of  study  called 
computational  biology,  and  HMMs  play  a  major  role  in  the  effort  [32,  33,  34],  Ex¬ 
amples  of  pattern  recognition  applications  of  HMMs  are  found  in  Arica  [35]  and 
Cai  [36]  where  HMMs  are  used  for  character  recognition,  Hu  [37]  where  HMMs  are 
employed  to  classify  facial  emotions,  and  Krishnamurthy  [38]  where  HMMs  process 
signal  information  in  the  presence  of  noise.  Gader  [39]  applies  HMMs  with  ground 
penetrating  radar  to  classify  mine  types. 

2.1.3  Theory 

This  section  draws  from  several  sources  in  developing  HMM  notation,  param¬ 
eterization,  and  mathematical  development.  Rabiner  [18]  and  Ghahramani  [19]  pro¬ 
vide  outstanding  tutorials  on  HMMs  and  their  applications.  Bilmes  [40]  and  El¬ 
liott  [21]  provide  helpful  development  of  HMM  algorithms  and  their  convergence 
theory. 

2. 1.3.1  Definition  and  notation 

The  notation  follows  the  stochastic  literature,  specifically  Kulkarni’s  stochastic 
system  analysis  text  [41],  and  a  blend  of  HMM  notation  as  found  in  Elliott’s  text  [21] 
and  Rabiner’s  HMM  tutorial  [18]. 

In  developing  the  theory  of  HMMs  we  begin  with  a  stochastic  process  { Xn ,  n  > 
0},  where  Xn  denotes  the  state  of  the  system  at  time  n  and  where  for  all  n  >  0,  Xn 
is  a  random  variable  taking  values  in  set  S.  We  further  assume  {Xn,n  >  0}  is  a 
discrete-time  Markov  chain  (DTMC)  with  finite  state  space  S  such  that 

1.  for  all  n  >  0,  Xn  G  S  with  probability  1, 
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2.  P{Xn. )_i  j  i,  Xn—  i  ^n— 1;  .  .  .  ,  Ao  fo}  P{Xn+ 1  j|^”n  for  fol 

n  >  0  and  % ,  j  G  S,  which  is  the  first  order  Markov  property. 

Next  we  introduce  the  idea  of  time-homogeneity.  A  DTMC  is  time-homogeneous 
when  the  conditional  probabilities  P{Xn+ 1  =  j \Xn  =  i}  are  independent  of  time, 
n  >  0,  for  all  i,j  G  S. 

In  an  HMM  the  finite-state  space,  time-homogeneous  DTMC  {Xn,n  >  0}  is 
hidden  and  can  only  be  observed  through  an  additional  stochastic  process,  { Yn ,  n  > 
0},  which  is  a  sequence  of  conditionally  independent  random  variables  with  the 
conditional  distribution  of  Yn  depending  on  the  hidden  DTMC  {Xn,n  >  0}  only 
through  the  state  at  time  n,  Xn.  In  a  discrete  HMM  the  state  space  of  Yn  is  finite. 

Thus  the  discrete-time  stochastic  process  {(Xn,Yn),n  >  0}  forms  a  hidden 
Markov  model  where  {Xn,n  >  0}  is  a  hidden  time-homogeneous  DTMC  with  finite 
state  space  and  {Yn,n  >  0}  is  an  observation  sequence  dependent  on  the  state  of 
the  hidden  DTMC  at  time  n. 

2. 1.3.2  Parameterization 

To  parameterize  a  discrete  HMM  a  stochastic  transition  matrix,  A  =  [atJ], 
where  =  P{Xn+l  =  j \Xn  =  i } ,  is  used  to  represent  the  hidden  Markov  chain.  In 
addition  to  the  hidden  state  transition  matrix,  A,  an  observation  distribution  matrix, 
B,  is  defined  by  [b:ri]  =  P{Yn  =  i \Xn  =  j}  where  each  observation  Yn  of  the  sequence 
{Yn,n  >  0}  is  one  of  Q  possible  values.  Finally,  the  parameterization  of  a  discrete 
HMM  must  include  an  initial  state  distribution,  tt,  =  P{A0  =  i}.  which  provides 
the  starting  point  for  the  hidden  Markov  chain.  Thus,  a  discrete  HMM  with  S 
hidden  states  and  Q  observation  states  is  parameterized  by  a  hidden  state  transition 
matrix,  A  =  [a^]  G  an  observation  distribution  matrix,  B  =  [bji]  G  R5^,  and 

an  initial  state  distribution,  7r  G  Is.  The  complete  set  of  parameters  for  a  given 
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Figure  3.  Trellis  diagram  of  a  discrete  HMM  with  4  hidden  states,  4  observation 
symbols  {r/i,  2/2,  2/3,  2/4} ,  and  sequence  length  T.  Here  A  is  the  hidden 
state  transition  matrix  and  governs  the  progression  of  the  hidden  Markov 
chain,  B  is  the  observation  distribution  matrix  and  governs  the  sequence 
of  observed  symbols,  and  tt  is  the  initial  state  distribution  and  governs 
the  starting  state  of  the  hidden  Markov  chain. 


model  is  written,  A  =  (A,B,7t).  Figure  3  provides  a  schematic  representation  of 
hidden  Markov  model  parameterization  and  functioning. 


2. 1.3. 3  Basic  HMM  problems 

Rabiner  discusses  three  basic  problems  associated  with  HMMs.  The  following 
derivations  with  slight  modification  to  the  notation  can  be  found  in  his  tutorial  [18]: 

•  Evaluation  Given  a  model  A  evaluate  the  probability  F{T|A}  of  producing 
a  specified  observation  sequence  of  length  T,  Y  G  {Yn}J.  Note  that  we  have 
adjusted  the  time  index  to  begin  at  n  =  1  so  that  a  sequence  from  time  n  —  1 
to  n  =  T  is  of  length  T  and  not  T  +  1. 

•  Decoding  Given  a  model  A  and  an  observations  sequence  Y  G  {TAjT-  find  the 
best  state  sequence  X  G  {Xn}f  that  explains  Y. 
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•  Learning  Given  an  observation  sequence,  or  set  of  observation  sequences, 
{y},  parameterize  an  HMM,  A*  =  (A,B,  n),  such  that  it  is  the  most  likely 
model  to  have  produced  the  given  data.  This  amounts  to  training  an  HMM 
given  observation  data. 

2. 1.3. 4  Evaluation  Problem 

The  first  HMM  problem  seeks  to  find  the  probability  of  producing  an  observa¬ 
tion  sequence  of  length  T,Ye  ,  given  a  model,  A  =  (A,  B,tt).  The  probability 

of  producing  an  observation  sequence  Y  —  YiY2  . . .  YT  given  a  hidden  state  sequence 
X  E  {Xn}f  with  X  =  XiX2  . . .  XT  and  the  model,  A  =  ( A ,  T>,  tt)  is 

T  T 

p{y\x,\]  =  J]p{y„|x„,A}  =  n  (i) 

71=1  77=1 

where  bJt  E  B.  the  observation  distribution  matrix.  Thus  P{Y\X.  A}  =  bx1Y1bx2Y2  •  •  • 1 
which  can  be  joined  with  the  probability  of  a  hidden  state  sequence  given  a  model, 
P{X\ A}  =  nx1ax1x2ax2X3  ■  ■  ■  &vT_iXT  to  yield  the  joint  probability  of  an  observation 
sequence  and  a  hidden  state  sequence  given  a  model 

P{Y,  X\X}  =  P{Y\X,  A}  •  P{X | A}.  (2) 

By  summing  over  all  hidden  state  sequences,  the  exact  formulation  for  P{T|A}  is 

J’O'IA}  =  P{Y\X,\}-P{X\\ } 

all  xe{xn} 

=  TTXi^XiVi  •  Ct'X1X2bx2Y2  ■  ■  ■  0'XT_1XTbxTYT-  (3) 

all  x1x2...xTe{xn} 

While  this  process  yields  an  exact  solution,  it  requires  an  intractable  number  of 
calculations,  2 T  ■  ST ,  where  T  is  the  sequence  length  and  S  is  the  number  of  hidden 
states  in  the  Markov  chain.  A  recursive  algorithm,  called  the  forward  procedure, 


'xtyt 
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reduces  the  number  of  necessary  calculations  to  a  manageable  level  and  provides  an 
efficient  means  of  calculating  P{F|A}. 

The  forward  variable,  a i(n),  is  defined  as  the  probability  of  observing  a  partial 
sequence  to  time  n  and  being  in  hidden  state  i  at  time  n  given  a  model  A,  or 

m(n)  =  P{YiY2  ...Yn,Xn  =  i\X}.  (4) 

Each  aii(n),l  <  n  <  T,  can  be  defined  recursively  given  an  initial  a,. 

1.  Initialization  step:  cq(l)  =  Tqpy,  which  is  the  probability  of  starting  in  state  i 
and  observing  Y\. 

2.  Inductive  step:  aq(n  +  1)  =  J2i=iai(n)aij  ^jYn+ 1  which  is  the  probability  of 
being  in  state  j  at  time  n  +  1  and  observing  the  sequence  Y{Y2  . . .  Yn+1.  The 
bracketed  portion  describes  the  probability  of  arriving  at  state  j  at  time  n  +  1 
from  state  i  at  time  n.  By  necessity  the  Markov  chain  must  be  in  one  of  S 
states  at  time  n.  Summing  0,(71)  over  S  states  accounts  for  all  possible  one- 
step  starting  points  at  time  n.  Multiplying  by  b:iyn+l  concludes  the  inductive 
step  by  incorporating  the  probability  of  being  in  state  j  at  time  n  +  1  and 
observing  yn+1. 

3.  Termination  step:  P{T|A}  =  Yli=iai(T)  where  T  is  the  observation  sequence 
length. 

The  recursive  algorithm  reduces  the  computational  complexity  of  finding  P{F|A} 
from  2 T  ■  ST  to  S2T.  Figure  4  uses  the  trellis  schematic  to  illustrate  calculation  of 
the  forward  variable. 
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time  =  n  n  +  1 


cci  (ri)  ccj(n+ 1) 

Figure  4.  Diagram  illustrating  the  hidden  Markov  chain  propagation  from  time  n 
to  n  +  1  for  the  forward  variable.  The  process  begins  in  state  i,  one 
of  S  states,  and  transitions  according  to  the  state  transition  matrix, 
A  =  [ajj],  to  state  j. 

2. 1.3. 5  Decoding  Problem 

The  backward  variable  is  similar  to  the  forward  variable  and  plays  a  role  in  the 
solution  to  the  second  HMM  problem:  given  a  model  A  and  an  observations  sequence 
Y  e  ,  find  the  state  sequence  X  e  {Xn}f  that  best  explains  Y. 

The  backward  variable  /3i(n )  is  defined  as  the  probability  of  observing  a  partial 
sequence  from  time  n  +  1  to  time  T  given  a  model  A  and  that  the  model  is  in  state 
i  at  time  n,  or 

Pi(n)  =  P{Yn+1Yn+2  ...YT\Xn  =  i,  A}.  (5) 

Again,  three  steps  are  used  efficiently  calculate  P{T|A}  using  the  newly  defined 
backward  variable  (see  Fig.  5): 

1.  Initialization  step:  fti(T)  =  1  which  arbitrarily  assigns  a  probability  of  1  to 
each  partial  sequence. 

2.  Inductive  step:  Pi(n)  =  Ylj=i  aijbjYn+iPj(n  +  1)  which  is  the  probability  of 
observing  the  partial  sequence  l^+il^,+2  . . .  YT  given  the  model  and  that  the 
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Figure  5.  Diagram  illustrating  the  hidden  Markov  chain  propagation  from  time  n 
to  n  +  1  for  the  backward  variable.  The  process  begins  in  state  i  and 
transitions  according  to  the  state  transition  matrix,  A  =  [a^  ] ,  to  state 
j,  one  of  S  states. 

model  is  in  state  i  at  time  n.  The  product  Uph,yn+1  gives  the  probability 
of  making  a  single  time-step  transition  from  state  i  to  state  j  and  observing 
Yn+i.  Multiplying  further  by  /3j(n+ 1)  incorporates  the  recursive  element  which 
accounts  for  the  remaining  sequence  steps.  Summing  over  the  S  possible  states 
accounts  for  the  S  possible  one-step  end  states  from  time  n  to  time  n  +  1. 

3.  Termination  step:  P{T|A}  = 

An  additional  variable,  7*(n),  is  needed  to  solve  the  second  HMM  problem, 
where  y,  (n)  is  defined  as  the  probability  of  being  in  state  i  at  time  n  given  the 
observation  sequence  Y  G  {I'h  jf  and  the  model  A,  or 


^{n)  =  P{Xn  =  i\YX}. 


(6) 


Note  that, 


=  PjY,  Xn  =  i\X]  =  P{Y, ,  An  =  i| A} 

P{Y  |A}  Ef=im^n=j|A} 


(7) 
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and  with  the  following  use  of  the  forward  and  backward  variables, 


oti{n)-Pi(n)  =  P{YlY2  ...Yn,Xn  =  i\X}-P{Yn+1Yn+2  . . .  YT\Xn  =  i,  X}  =  P{Y ,  Xn  =  i\X}, 


we  can  write 

Oi(n)  ■  pi(n ) 

7 An)  =  — e - • 

E?=i  <Xj(n)  ■  Pj(n) 

Then,  to  solve  the  second  HMM  problem  and  find  the  sequence  of  the  individually 
most  likely  hidden  states,  {A"*}  =  X\X2  . . .  XT,  we  make  the  following  comparison 
at  each  step  of  the  sequence 

Xn  =  i  such  that  i  =  argmax  [7i(n)]  for  1  <  n  <  T.  (9) 

l<i<S 

2. 1.3. 6  Learning  Problem 

The  third  and  most  complicated  HMM  problem  seeks  to  update  the  model 
parameters,  A  =  (A,  B.  7r),  to  maximize  the  probability  of  the  observation  sequence 
given  the  model.  The  most  commonly-used  algorithm  for  this  task  is  the  Baum- 
Welch  algorithm  [23,  24],  Its  derivation  is  shown  here  in  two  separate  ways:  first, 
following  the  notation  used  thus  far,  and  second,  in  notation  more  appropriate  to 
expectation  maximization  studies. 

An  additional  variable,  (n) ,  is  needed  to  solve  the  third  HMM  problem  (see 
Fig.  6).  It  is  defined  as  the  joint  probability  of  being  in  state  i  at  time  n  and  being 
in  state  j  at  time  n  +  1  given  an  observation  sequence  and  a  model,  or 


£ij(n)  =  P{Xn  =  i,Xn+1=j\Y,X}. 


Expanding  the  definition  using  the  given  observation  sequence  gives 


P{Xn 


i,Xn+1=j,Y\X} 

P{Y'|A} 


(10) 


(11) 
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time  =  n-1  n  n  +  1  n  +  2 


♦ . .  «,(«)  Pj(n  + 1) - ► 

Figure  6.  Diagram  illustrating  the  hidden  Markov  chain  propagation  from  time  n 
to  n  +  1  for  £ij(n)  incorporating  the  forward  and  backward  variables. 


Incorporating  the  forward  and  backward  variables  gives 


&i(n) 


'  «ijWi  '  Pj(n  +  !) 

P{Y\\} 


Summing  over  all  possible  one-step  state  transitions  yields 


Oitn)  •  OjjbjYn+i  •  Pj{n  +  1) 

Si=l  Y2j  1  ai(n )  '  aijbjYn+ 1  '  @j(n  +  1) 


(12) 


(13) 


Finally,  given  an  observation  sequence  Y  and  summing  7,;  (n)  and  ^p(n)  over 
time  (i.e.  over  the  entire  sequence  length  T )  yields  the  following  results: 

1.  Given  Y ,  Y^n=\  7*  (n)  is  the  expected  number  of  visits  to  state  i  and,  conversely, 
is  also  the  expected  number  of  transitions  away  from  state  i. 

2.  Given  Y,  Yln=i€ij  (n)  the  expected  number  of  transitions  from  state  i  to 
state  j. 

3.  The  expected  relative  frequency  spent  in  state  i  at  time  n  —  1  forms  an  update 
to  the  initial  hidden  state  distribution,  ir, 


Tti  =  7i(l). 


(14) 
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4.  The  expected  number  of  transitions  from  state  i  to  state  j  relative  to  the 
expected  number  of  transitions  away  from  state  i  forms  an  update  to  the  hidden 
state  transition  matrix  A, 


h  _ELi^(n) 

av  v^t  /  \  • 

En= l7iW 

5.  The  expected  number  of  times  the  observation  i  is  observed  while  in  state  j 
relative  to  the  expected  number  of  visits  to  state  j  forms  an  update  to  the 
observation  distribution  matrix  B, 


ELi  fvn=i7i(n) 

EL  i7*(n) 


(16) 


For  an  initial  parameterization  of  the  model  A0  and  for  a  given  an  observation  se¬ 
quence  Y,  updating  the  model  using  the  above  equations  yields  a  new  model  A  that 
is  more  likely  than  Ao  to  have  produced  the  observation  sequence.  By  iteratively 
applying  the  update  equations,  Baum  et  al.  have  shown  that  the  model  achieves 
a  local  maximum  in  the  likelihood  function  of  the  parameterized  model  given  the 
observation  information  [23,  24], 


2. 1.3. 7  Example  Problem 


Given  a  two-state,  discrete  HMM,  A  =  ( A ,  B,  ir),  parameterized  by  the  hidden 
state  transition  matrix 


0.9  0.1 
0.1  0.9 


where  [a^]  =  P{Xn+1  =  j \Xn  =  *}, 


(17) 
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the  observation  distribution  matrix 


0.89  0.11 
0.11  0.89 


where  [6^]  =  P{Yn  =  i\Xn  =  j}, 


and  the  initial  state  distribution  vector 


(18) 


0.5 

0.5 


where  [77]  =  P{X o  =  i}, 


(19) 


find  t i(n)  =  P{Xn  =  i\Y,  X},  defined  as  the  probability  of  being  in  state  i  at  time  n 
given  the  observation  sequence 


Y  =  (2,2,  2, 1,1,  2,  2,  2) 


and  the  above  model  A  =  ( A,B,n ).  First,  note  that  7 *(n)  can  be  written  in  terms 
of  the  forward  and  backward  variables 


7iW 


cq(n)  •  Pj(n) 
Ef=i  <Xj(n)  •  (3 ^n) 


Second,  determine  the  best  estimate  of  the  sequence  of  hidden  states  given  the  obser¬ 


vation  sequence  Y.  The  forward  variable  is  defined  as  Qj (n+1)  =  1  ai(n)aij 

Table  1  iterates  through  the  forward  variable  calculations. 


'jYn+ 1  ' 


The  backward  variable  is  defined  as  Pi(n)  =  Ef=i  aijbjYn+1/3j(n  +  l).  Table  2  iterates 
through  the  backward  variable  calculations. 


To  find  the  probability  of  being  in  state  i  at  time  n  given  the  model  A  and  the 
observation  sequence  Y  =  (2,  2,  2, 1, 1,  2,  2,  2),  the  forward  and  backward  variable 
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Table  1.  Forward  variable  calculations 


n 

0 

1 

2 

3 

4 

5 

6 

7 

8 


ai(n) 


[0.9-ai(0)  +  0.1-a2(0)] 

[0.9  •  ai(l)  +  0.1  •  a2(l)]  • 
[0.9-ai(2)+0.1-a2(2)] 

0.9  •  ou(3)  0.1  •  02(3) 

0.9  •  «i(4)  +  0.1  •  a2(4) 

0.9  •  oi(5)  0.1  •  02(5) 

[0.9  •  ai(6)  +  0.1  •  o2(6)] 

[0.9  •  01  (7)  +  0.1  •  o2(7)]  •  0.11  =  7.7  •  10 


=  0.5 
•0.11  =  0.055 
0.11  =  0.0103 
•0.11  =  0.005 

•  0.89  =  0.0299 

•  0.89  =  0.0265 

•  0.11  =  0.0027 

0.11  =  0.00032 

-5 


a2(n) 


=  0.5 


[0.1  •  ai(0)  +  0.9  •  a2(0)]  •  0.89  =  0.455 


0.1  •  cti(l)  +  0.9  •  a2(l) 
0.1  •  «i(2)  +  0.9  •  a2(2) 
0.1  •  an  (3)  T  0.9  •  a2(3) 
0.1  •  «i(4)  +  0.9  •  a2(4) 
0.1  •  «i(5)  +  0.9  •  a2(5) 
0.1  •  «i(6)  +  0.9  •  a2(6) 
0.1  •  cki(7)  +  0.9  •  a2(7) 


0.89  =  0.3613 
0.89  =  0.2904 
0.11  =  0.0288 
0.11  =  0.0032 
0.89  =  0.0049 
0.89  =  0.0042 
0.89  =  0.0034 


Table  2.  Backward  variable  calculations 


n  / 3i(n ) 

8  =  1 
7  0.9  •  O.ll)0i (8)  +  0.1  •  0.89/32(8)  =  0.188 

6  0.9  •  0.11/01  (7)  +  0.1  •  O.89/02(7)  =  0.0909 

5  0.9  •  0.11/01  (6)  +  0.1  •  0.89/?2 (6)  =  0.0671 

4  0.9  •  0.89/01  (5)  +  0.1  •  0.11/02(5)  =  0.0595 

3  0.9  •  0.89/01  (4)  +  0.1  •  0.11/32(4)  =  0.0483 

2  0.9  •  0.11/01  (3)  +  0.1  •  0.89/02(3)  =  0.00576 

1  0.9  •  0.11/01  (2)  +  0.1  •  O.89/02(2)  =  0.0014 


_ /h  (n) 

=  1 

0.1  •  0.11/01  (8)  +  0.9  •  0.89/02 (8)  =  0.812 
0.1  •  0.1  l/0i  (7)  +  0.9  •  0.89/02 (7)  =  0.6525 

0.1  •  0.11/01  (6)  +  0.9  •  0.89/02(6)  =  0.5236 

0.1  •  0.89/01  (5)  +  0.9  •  0. ll/02 (5)  =  0.0578 

0.1  •  0.89/01  (4)  +  0.9  •  0.1 1/02 (4)  =  0.0110 

0.1  •  0.11/01  (3)  +  0.9  •  O.89/02(3)  =  0.00936 
0.1  •  0.11/01  (2)  +  0.9  •  0.89/02 (2)  =  0.0075 


calculations  of  Tables  1  and  2  are  used  to  calculate 


7l(n)=P{Xn  =  i\Y1\} 


otj(ri)  •  Pj(n) 

Ef=  1  ai(n)  ■  Pj(n) 


Table  3  shows  the  resulting  probabilities  with  boldface  indicating  the  more  likely 
state  at  time  n.  Thus,  the  best  estimate  of  the  sequence  of  hidden  states  given  the 
observation  sequence  is 


(Xn}8  =  (S2>  s2,  S2,  SuSu  S2,  S2,  S2). 
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Table  3.  Gamma  variable  calculations 


n 

7i  (n) 

72  (n) 

i 

0.02243 

0.97757 

2 

0.01731 

0.98269 

3 

0.07015 

0.92985 

4 

0.51604 

0.48396 

5 

0.51604 

0.48396 

6 

0.07015 

0.92985 

7 

0.01731 

0.98269 

8 

0.02243 

0.97757 

2. 1.3.8  Learning  Problem  using  Expectation  Maximization  (EM) 

In  this  section  an  EM  approach  is  taken  toward  parameter  re-estimation  of  a 
discrete  HMM.  The  goal  is  to  maximize  the  likelihood  (or  in  this  case  log- likelihood) 
function  C  by  hnding  the  maximum  likelihood  estimates  (MLE)  of  the  model  param¬ 
eters  A  given  the  complete  data  (i.e.,  the  observation  data  Y  and  the  hidden  state 
sequence  X).  The  likelihood  function  with  complete  data  is 


£(A)  =  log  P(Y,  W|A). 


However,  we  have  incomplete  data  in  the  case  of  the  hidden  Markov  model:  we  do 
not  know  the  true  hidden  state  sequence,  X.  We  seek  to  maximize  the  posterior 
probability  of  the  parameters  A  given  the  observation  data  Y,  marginalizing  over  the 
missing  state  sequence  data  X: 

A  =  argmax  log  £  P(Y,X  |  A) 

A  \  x 

Finding  the  MLE  for  A  by  maximizing  £(A)  directly  can  be  difficult  to  compute 
(log  of  a  large  sum).  A  simplification  makes  use  of  Jensen’s  inequality,  which  states 


E{f(X)]<f(E{X]) 
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for  X  a  random  variable  and  /  a  concave  (e.g.  log)  function  defined  over  at  least 
the  range  of  X,  which  changes  the  log  of  large  sum  to  a  sum  of  logs.  Thus,  given  an 
arbitrary  distribution  F(X)  over  the  hidden  variables 


log  E  P(Y,X  |A)  = 


> 


log  E  F( A') 


P(Y,X  [A) 
F(X) 


E  FiX  >  IoS 


P(Y,X\X) 

F(X) 


Y  F(X)  log  P(Y,  X\X)~Y  F(X)  log  F(X) 

X  X 

0(A,F). 


(20) 

(21) 

(22) 

(23) 


The  EM  algorithm  [25]  provides  an  iterative  approximation  method  which 
alternates  between  maximizing  Q  with  respect  to  F  while  holding  A  fixed  and  max¬ 
imizing  Q  with  respect  to  A  while  holding  F  fixed: 

1.  Set  p  —  0  and  choose  Xp,  the  initial  HMM  parameter  estimates. 

2.  Perform  the  expectation  step: 

Fp+ 1  <-  argmax  Q( Xp,  F)  (24) 

F 

3.  Perform  the  maximization  step: 

Ap+i  <-  argmax  Q (A,  Fp+l)  (25) 

A 

4.  Replace  p  with  p  +  1  and  repeat  steps  2  through  4  until  a  stopping  threshold 
is  reached. 

Finding  Fp+ 1  in  step  2  begins  with 

Q(XP1  F)  =  Y  F(X)  log  P(Y,  X\ Xp)  -  Y  FX)  F(x)  (20 

x  x 
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and  maximizing  over  F.  To  do  this  the  Lagrangian 


Q  =  Q  +  1(l~YJF(X))1 


X 


is  introduced.  Determining  its  derivative 


dQ 

dF 


=  logP(Y,X|Ap)  -  log  F(X)  -1  +  7 


and  setting  it  equal  to  zero  yields 


0  =  logP(Y,X\Xp)-logF(X)-l  +  1 
log  F(X)  =  log  P(Y,  X|AP)  —  1  +  7 
F(X)  =  P{Y,  X|Ap)  •  e7-1. 


(27) 


Summing  over  X  yields 


=  e'-'YtP(Y,x\\) 


X 


X 


o7-l  _ 


p(Y\xpy 


(28) 


and  substituting  Eq.  28  into  Eq.  27  gives 


F(v,nr,x\ \)_ 

F{X)-  P(F|Ap)  -PWY-X-> 


(29) 
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When  Fp+i  is  set  to  P(X\Y,  Xp),  Eq.  26  becomes 


Q(\P,FP+1)  =  Q(Xp,P(X\Y,Xp )) 

=  Y,P(X\Y,XP)logP(Y,X\Xp)  -  Y,P(X\Y,XP)logP(X\Y,Xp) 


X 


X 


=  rmiFii  ,rc-py.AiA,) 

Y  (  }  '  r)  SP(X\Y,\„) 

=  ^2  P(X\Y,  A,)  ■  log  p(Y\\r) 

X 

=  iogP(y|Ap).^p(x|y,Ap) 

=  iogp(y|Ap)-i 
=  iogP(y|Ap) 

=  C(\p). 


(30) 


Thus,  the  maximum  in  step  2  is  obtained  by  setting  FP+1(X)  =  P(X\Y,Xp), 
where  the  bound  becomes  an  equality  with  the  objective  Q(XP,  Fp+ 1)  =  C( Xp).  The 
maximum  in  step  3  is  found  by  maximizing  the  first  term  of  Eq.  22,  since  the  second 
term  does  not  depend  on  A.  Thus 

Ap+i  argmax  E  P(X\Y,  Xp)  log  P(Y,  X|  A)  (31) 

A  x 

The  sequence  A0,  Ai, . . . ,  Xp,  for  p  >  0.  yields  nondecreasing  values  of  the  likelihood 
function  that  converge  to  a  local  maximum.  Thus  Q  forms  a  lower  bound  of  the 
likelihood  function  C.  The  EM  algorithm  ascends  the  likelihood  function  in  the 
parameter  space. 

In  the  following  steps  we  evaluate  Eq.  31  by  summing  over  all  X  G  {Xt}  to 
define  the  incomplete-data  Q  function  in  terms  of  the  complete-data: 

Q( X,  Fp+1)  =  Y,  P(X\Y,  A p)  log  P(Y,  X\X)  (32) 

x 
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Given  a  particular  state  sequence  X  =  xqX\  . . .  xt, 


X|A)  TTxo  11  axn-l Xnbxnyn- 


Substituting  back  into  the  Q  function,  and  simplifying  yields 


Q(\FP+1)  =  Y  P(XX  xp)  log  (  *x0  n  axn-i xnbXni 


X  \_n=l 


£i°«^'-pwy-Ap)+£  E  log 

axn-i xnbxnyn  ■  P(X\Y,  Xp) 


X  \_n=l 


Xl°i^-p(x\Y,\P)+j2  E(lo«  aXn-l Xn  +  l°g<w)  -P(X\Y,\) 


YJ\o^n-p(x\Y,\r)  +  YJ  E  log 


X  \_n=l 


P(X\Y,Xp)  + 


X  Ln=l 


E  E10®6-*.  -p(x\y\p). 


Taking  each  term  of  Eq.  33  in  turn,  the  parameters  of  the  HMM  are  optimized. 
Taking  the  first  term  and  finding  the  marginal  expression  at  time  n  —  0  gives 

Elo§^0  •  G(X|y,  Ap)  =  ^log7Ti  •  P(x o  =  i\Y,  Ap), 

X  i= 1 

where  S  is  the  number  of  hidden  states.  To  optimize,  a  Lagrange  multiplier  7  is  used 
and  the  added  stochastic  constraint  XiLi  ^  =  1  is  enforced: 


Y  loS  ni '  P(x°  =  i\Y,  XP)  +  7  (  Yni  ~  1 


Solving  for  77  yields 


77  =  P(a;0  =  *|y,  Ap). 
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The  second  term  of  Eq.  33  becomes 


E 


Elog( 


uxn- lXn 


.71=1 


S  S  T 


P(X\Y,  Xp)  =  EEE  log  OijPiXn-!  =  i,Xn  =  j\Y,  \p). 

7=1  j  =  1  77=1 


_ ^  £3 

Applying  a  Lagrange  multiplier  and  the  constraint  Ylj= i  =  1  and  solving  yields 


X^77=l  ^(^77—1  L  ^77  j|V, 

TLiP(x»-i  ~i\Y,\) 


(35) 


The  third  term  of  Eq.  33  becomes 


E 

X 


E  lo§ b 


XnVn 


77=1 


S'  T 


•  P(X\Y,  Xp)  =  EE  l0S&i3/n^n  =  j\Y,Xp). 

j= 1  n=l 


Applying  a  Lagrange  multiplier  and  the  constraint  Yl?=i  bji  =  -*■  (where  Q  is  the  size 
of  the  discrete  alphabet)  and  solving  yields 

*  ELin*»  =  iirv> 

where  the  5-function  contributes  only  when  the  rS1  observation  matches  the  i^1 
symbol  of  the  observation  alphabet.  Note  that  the  parameter  re-estimation  equa¬ 
tions  of  the  EM  development  (Eqns.  34,  35,  and  36)  match  those  of  the  previous 
development  (Eqns.  14,  15,  and  16)  with  the  subtle  difference  of  indexing  time  at 
n  =  0  to  T  instead  of  n  =  1  to  T  +  1. 


2. 1.3. 9  Extension  to  Continuous  Observation  Space 

To  this  point,  development  of  HMM  theory  uses  a  discrete  observation  space, 
i.e.,  a  discrete  probability  density  associated  with  each  hidden  state  models  the 
observations.  For  problems  where  observations  are  continuous  signals,  a  method 
must  be  used  to  quantize  the  signals  into  a  discrete  space.  This  process  may  degrade 
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model  performance  by  losing  information  through  quantizing.  A  useful  extension  of 
the  discrete  HMM  is  one  with  continuous  observation  densities. 

Previously,  the  observation  distribution  matrix  B  is  defined  by  [bji]  =  P{Yn  = 
i\Xn  =  j } .  where  each  observation  Yn  of  the  sequence  { Yn ,  n  >  0}  is  one  of  Q 
possible  values.  In  the  continuous  case,  researchers  typically  use  a  finite  mixture  of 
Gaussians  to  approximate  any  finite,  continuous  density  function  [18].  A  continuous 
observation  probability  with  M  mixture  components  and  S  states  has  the  form 

M 

bj(Y )  =  Yjk)  for  l<j<S,  (37) 

k= 1 


th 

where  Y  is  the  observation  sequence,  Cjk  is  the  kl  component  mixture  in  state  j, 

th 

and  d'  is  the  Gaussian  kernel  for  the  k[dL  component  mixture  in  state  j  with  mean 
fijk  and  covariance  E.^..  The  kernel  is 


(/T'/e :  Yjk) 


(27r)<i/2|Sjfc|1/2 


exp 


~2  (x-VjkYZjk 


The  following  constraints  must  be  satisfied  by  the  mixture  components: 


M 


Y  G'fc  =  1  for  1  <  j  <  s 


k= 1 


Cjk  >  0  for  1  <  j  <  S,  1  <  k  <  M, 


(38) 


(39) 

(40) 


leading  to  a  probability  density  function  that  integrates  to  one  over  the  observations 


bj(Y)dY  =  1  for  1  <  j  <  S. 


(41) 


2.1.3.10  HMMs  in  Automatic  Target  Recognition 

This  section  reviews  applications  in  the  literature  of  HMMs  to  target  recogni¬ 
tion,  specifically  using  high  range-resolution  radar  (HRR)  signatures  as  the  source  of 
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classifier  feature  data.  The  purpose  is  to  highlight  encouraging  results  while  noting 
the  assumptions  and  limitations  of  each  experiment. 

A  series  of  Air  Force  Institute  of  Technology  (AFIT)  research  efforts  [42,  17, 
43]  applied  HMMs  to  pattern  recognition  problems  using  features  based  on  HRR 
signatures.  DeWitt  [42]  processed  HRR  signatures  produced  by  a  synthetic  CAD- 
based  Xpatch®  model  using  the  Prony  technique.  The  feature  vectors  produced  by 
the  Prony  technique  describe  scattering  centers  of  the  target.  These  feature  vectors 
were  quantized  using  a  /.•-means  method  in  order  to  apply  a  discrete  HMM.  DeWitt 
considered  a  two-class  problem  with  prior  knowledge  of  target  aspect  and  azimuth  to 
within  ±5°.  To  test  classifier  robustness,  Gaussian  noise  was  added  to  the  training 
data. 

Fielding  [17]  compared  discrete  and  Gaussian-mixture  HMMs  in  an  effort  to 
classify  sequences  of  2-D  images  of  3-D  objects.  A  five-class  problem  of  ground  tar¬ 
gets  with  additive  noise  was  studied.  Feature  data  was  derived  from  the  coefficients 
of  low-frequency  Fourier  transformed  CAD-based  target  images.  Prior  knowledge  of 
target  aspect  angle  was  ±45°.  In  the  discrete  case  a  clustering  method  was  used  to 
quantize  the  data.  Fielding  found  that  the  continuous  HMM  performed  better  than 
the  discrete  HMM  in  general  but  not  at  all  experiment  design  points. 

MacDonald  [43]  applied  Gaussian-mixture  HMMs  operating  on  low-frequency 
spectral  components  of  Fourier  transformed  HRR  signatures.  He  studied  a  three- 
class  problem  of  airborne  targets.  The  research  found  that  forcing  a  relationship  be¬ 
tween  the  hidden  states  and  target  orientation  improved  classification  performance. 
The  process  resulted  in  an  observable  Markov  process  rather  than  a  hidden  one.  It 
was  unclear  how  training  and  testing  data  were  segregated  (if  at  all). 

Another  series  of  inter-related  research,  separate  from  the  above  listed  AFIT 
research,  focused  on  HMM-based  time-series  classification  [44,  45,  46,  47,  48].  Runkle 
compared  discrete  versus  Gaussian-mixture  HMMs  in  classifying  submerged  objects 
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using  features  extracted  from  sequences  of  acoustic  waveforms  and  demonstrated  the 
marked  benefit  of  using  continuous  HMMs  [44,  45]. 

Bharadwaj  and  Runkle  applied  continuous  HMMs  with  linear  density  distri¬ 
butions  in  the  observed  feature  space  [46].  The  study  used  two  airborne  targets 
modeled  by  Xpatch®  with  features  extracted  via  matching  pursuits. 

Liao  and  Runkle  applied  HMMs  to  ground  target  identification  using  features 
extracted  from  SAR-based  HRR  signatures  [47,  48].  In  both  papers  the  RELAX 
algorithm  [49]  was  used  to  extract  point  scatterer  features  from  HRR  signatures  of 
sequenced  SAR  data  of  ten  target  classes  from  the  Moving  and  Stationary  Target 
Acquisition  and  Recognition  (MSTAR)  data  collection  (covered  in  Section  2.2). 

Other  research  has  been  reported  in  the  area  of  HMMs  using  HRR  signatures 
for  target  classification.  Paul  implemented  a  hybrid  classifier  using  an  eigen-template 
to  score  HRR  signatures  prior  to  being  input  to  discrete  HMM  classifiers  [50].  His 
study  used  MSTAR  data  with  four  targets,  but  appears  to  have  used  the  same  data 
to  train  and  test  the  hybrid  classifier. 

In  Kottke  et  al.  [51]  and  Nilubol  et  al.  [52,  53]  a  Radon  transformation  on  seg¬ 
mented  two-dimensional  SAR  images  was  used  to  produce  rotation  and  translation- 
independent  features.  These  features  were  ordered,  clustered,  and  input  to  class- 
specific  discrete  HMMs  for  classification. 

Additionally,  Jacobs  et  al.  [54],  Zhou  et  al.  [55],  and  Pei  et  al.  [56]  each  imple¬ 
mented  HMM-based  classifiers  acting  on  sequenced  HRR  signatures. 

Evidence  of  success  in  applying  HMMs  to  the  problem  of  sequential  observation 
target  classification  warrants  further  study.  A  review  of  the  literature  shows  several 
areas  of  potential  research: 

•  use  collected  SAR  data  instead  of  synthetic  data  to  best  capture  realistic  op¬ 
erating  conditions 

•  use  multi-class  target  sets  with  greater  than  five  target  classes 
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•  use  a  methodology  to  design  HMM  structures  that  is  supported  by  model 

selection  and  information  theory 

•  include  a  rejection  option  for  classifier  labeling 

•  study  of  out-of-library  performance  for  HMMs 

•  study  the  impact  of  prior  knowledge  of  initial  target  pose 

•  study  classifier  performance  when  constrained  by  warfighter  preferences 

2.2  High  range-resolution  radar 

2.2.1  Introduction 

Target  recognition  of  moving  targets  based  on  SAR  imaging  poses  a  challenge 
due  to  blurring  from  target  motion  while  forming  the  synthetic  aperture.  Recent 
research  [57,  58,  59]  points  to  high  range-resolution  radar  (HRR)  as  a  possible  ap¬ 
proach  for  recognizing  moving  targets.  Here,  HRR  refers  to  a  radar  operating  in  a 
specified  bandwidth  that  is  capable  of  producing  high-resolution  returns  with  signif¬ 
icantly  enhanced  target  to  clutter  (and  noise)  ratios  through  Doppler  filtering  and 
clutter  cancellation.  Returns  from  HRRs  form  focused  range  (or  one-dimensional) 
profiles  which  identify  specific  target  scattering  centers.  These  scattering  centers  are 
related  to  the  physical  geometry  and  material  composition  of  the  target  and  thus 
form  a  means  of  identifying  the  target. 

2.2.2  Literature 

Several  AFIT  research  efforts  have  studied  HRR  signatures  and  their  use  in 
target  classification.  In  addition  to  DeWitt [42]  and  MacDonald[43]  as  described  in 
Section  2.1.3.10,  Meyer’s  PhD  research  [60]  studied  invariant  features  drawn  from 
sequenced  HRR  signatures  and  applied  a  template-based  classifier. 
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Zumwalt’s  master’s  thesis  [61]  used  XPatch®-derived  HRR  signatures  of  air¬ 
borne  targets  as  the  feature  source.  Zumwalt  proposed  a  multinomial  pattern  match¬ 
ing  classifier  which  out-performed  baseline  linear  and  quadratic  classifiers. 

HRR-based  target  recognition  research  outside  of  AFIT  includes  Williams  [57, 
58,  59]  and  proposes  template-based  ATR  algorithms  using  HRR-derived  features. 
Mitchell  [62]  introduced  a  statistical  feature  based  classifier  acting  on  HRR  profiles 
of  airborne  targets.  Shaw  [63]  used  a  template-based  classifier  with  eigenvalues 
associated  with  HRR  profiles  across  aspect  angle.  Zajic  [64]  employed  wavelets- 
based  features  drawn  from  HRR  profiles  in  a  template  scheme. 

The  research  performed  and  reported  here  combines  HMMs  with  HRR  signa¬ 
tures  in  a  classification  experiment.  Fundamentally,  an  airborne  radar  illuminates 
a  ground  target  and  the  reflected  radar  information  is  collected  and  processed  for 
classification.  Previous  efforts  have  used  features  derived  from  HRR  profiles  in  clas¬ 
sifying  airborne  and  ground-based  targets,  but  fusing  multiple  HMMs  operating  on 
HRR-derived  features  breaks  new  ground. 

2.2.3  HRR  Processing 

This  research  uses  complex  SAR  data  contained  in  two  collections,  MSTAR  [65] 
and  DCS.  Processing  of  the  SAR  data  is  required  to  form  HRR  signatures.  This 
section  describes  the  required  steps. 

•  The  target  in  the  SAR  chip  is  segmented  (outlined)  from  background  clut¬ 
ter  using  a  target-sized  mask  to  simulate  doppler  filtering  (MSTAR  contains 
stationary  targets). 

•  The  SAR  image  formation  process  in  the  cross-range  dimension  is  reversed. 

•  Cross-range  inverse  FFTs  are  applied  to  obtain  range  signatures  collected  over 
the  synthetic  aperture. 
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•  The  complex  range/angle  data  is  de- weighted  in  angle  using  an  inverse  Taylor 

window  over  the  valid  data. 

•  Each  range  bin  is  magnitude-detected  and  normalized  by  the  mean  power  in 

the  signature  to  remove  automatic  gain  control  and  range  effects. 

•  The  pixels  are  averaged  in  azimuth  to  form  the  HRR  profile. 

Of  the  SAR  imagery  used  in  the  research  of  Section  2.1.3.10,  the  most  realistic 
studies  used  HRR  signatures  derived  from  MSTAR  SAR  data. 

2.2.4  MSTAR  Program 

The  conversion  process  begins  with  the  SAR  chip.  The  example  chip  is  taken 
from  the  MSTAR  publicly-available  data  set,  a  subset  of  Collection  1  taken  Sep 
1995  at  the  Redstone  Arsenal,  Huntsville,  AL  by  the  Sandia  National  Laboratory 
(SNL)  STARLOS  sensor,  operating  at  X-band  in  one  foot  resolution  spotlight  mode. 
The  collection  was  jointly  sponsored  by  DARPA  and  AFRL  as  part  of  the  MSTAR 
program. 

The  example  SAR  chip  is  of  a  T-72  main  battle  tank,  serial  number  812  (1  of  3 
T-72  tanks  imaged  in  the  publicly-available  data  set),  17  degree  angle  of  depression, 
and  an  aspect  angle  of  345.8  degrees.  Figure  7  shows  a  photograph  of  the  T-72 
target. 

2.2.5  SAR  Chip 

This  section  discusses  MSTAR  target  chip  image  files.  Target  chips  are  sub¬ 
images  extracted  from  MSTAR  target- type  full  scene  images.  MSTAR  target  chips 
consist  of  an  ASCII  Phoenix  header  followed  by  a  section  of  32-bit  floating  point 
magnitude  data  and  a  section  of  32-bit  floating  point  phase  data  (in  polar  complex 
format).  Target  chip  image  data  is  calibrated  in  units  of  meters  for  magnitude  data 
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Figure  7.  Photograph  of  the  target  T-72  main  battle  tank. 

and  radians  for  phase  data.  Figure  8  shows  the  raw  magnitude  and  phase  chip  data 
in  grayscale  form. 


Figure  8.  Magnitude  (on  the  left)  and  phase  (on  the  right)  information  from  the 
example  T-72  MSTAR  SAR  chip.  Both  are  128  by  128  pixels  in  size. 
Pixel  information  is  shown  through  a  256-level  grayscale. 

The  Phoenix  header  is  the  standard  ASCII  data  header  included  with  all 
MSTAR  image  hies.  MSTAR  target  chip  Phoenix  headers  contain  general  and 
sensor-specific  information.  Table  4  contains  the  complete  header  information  from 
the  example  MSTAR  SAR  chip. 
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Table  4.  MSTAR  SAR  chip  header  information 


PhoenixHeaderLength: 

1975 

PhoenixSigSize: 

133047 

PhoenixSigNum: 

1 

PhoenixHeaderCallingSequence: 

Header  VersionN  umber: 

‘2CM’ 

native_header_length: 

0 

Filename: 

‘hb03787.0016’ 

Chip_MD5_CheckSum: 

‘d721d2b842fe498 

a9f3ccb67c797fac9’ 

ParentScene: 

‘hb03787’ 

Site: 

‘redstn’ 

NumberOfColumns: 

128 

N  umberOfRows: 

128 

TargetType: 

‘t72_tank’ 

TargetSerNum: 

‘812’ 

Target  Az: 

345.7742 

TargetRoll: 

-0.5911 

TargetPitch: 

359.6368 

Target  Yaw: 

35.0758 

DesiredDepression: 

17 

DesiredGroundPlaneSquint: 

-90 

DesiredSlantPlaneSquint: 

-90 

DesiredRange: 

4500 

DesiredAimpointLat: 

34.6781 

DesiredAimpointLong: 

86.6874 

DesiredAimpointElevation: 

166 

DesiredAimpointLatRef: 

‘N’ 

DesiredAimpointLongRef: 

‘W’ 

MeasDepression: 

17.0938 

MeasGroundPlaneSquint: 

-91.5775 

MeasSlantPlaneSquint: 

-91.5078 

MeasuredRange: 

4475 

MeasAimpointLat: 

34.6781 

MeasAimpointLong: 

273.3092 

MeasAimpointElevation: 

165.3860 

MeasAimpointLatRef: 

‘N’ 

MeasAimpointLongRef: 

‘E’ 

Meas  AntennaLat : 

34.6533 

MeasAntennaLong: 

-86.6551 

MeasAircraftHeading: 

41.6016 

MeasAircraft  Altitude: 

1.4808e+003 

RadarMode: 

‘mode  5  -  spot  light’ 

SensorCalibrationFactor: 

42.9960 

RadarPosition: 

‘bottom’ 

Range3dBWidth: 

0.3013 

CrossRange3dB  Width: 

0.3229 

SceneCenterRefLine: 

40 

X_Velocity: 

39.4570 

DataCollectors: 

‘Sandia  National  Lab’ 

CollectionDate: 

19950902 

CollectionTime: 

82205 

CollectionName: 

‘hb’ 

SensorName: 

‘Twin  Otter’ 

Classification: 

‘UNCLASSIFIED’ 

MultiplicativeNoise: 

‘-10  dB’ 

AdditiveNoise: 

‘-32  to  -34  dB’ 

CenterFrequency: 

‘9.60  GHz’ 

CrossRange  Weighting: 

‘-35dB_Taylor’ 

Range  Weighting: 

‘-35dB_Taylor’ 

DynamicRange: 

‘64  dB’ 

Bandwidth: 

‘0.591  GHz’ 

RangeResolution: 

0.3047 

CrossRangeResolution: 

0.3047 

RangePixelSpacing: 

0.2021 

CrossRangePixelSpacing: 

0.2031 

AveragelmageCalFactor: 

0.9708 

Polarization: 

‘HH’ 

TargetSeasonalCover: 

‘only  growing  vegitatiom 

Target  Wat  er  Content : 

‘dry’ 
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2.2.6  SAR  Chip  Manipulation 


The  original  complex  SAR  chip  is  formed  by  combining  the  128  by  128  mag¬ 
nitude  information  with  the  128  by  128  phase  information: 


Coriq  =  Mel'p, 


where  M  is  the  matrix  containing  magnitude  information  and  P  is  the  matrix  con¬ 
taining  phase  information.  Figure  9  shows  the  128  by  128  pixel  complex  modulus 


Cross-range 


Figure  9.  Combining  the  magnitude  and  phase  information  results  in  the  baseline 
complex  SAR  chip. 

To  obtain  the  phase  histories,  or  range  profiles,  several  steps  made  in  forming 
the  MSTAR  images  are  undone.  The  MSTAR  images  are  formed  by  taking  a  2-D 
inverse  FFT  of  the  Taylor- windowed,  zero-padded  phase  history  data  on  a  rectan¬ 
gular  grid.  To  undo  these  steps,  the  2-D  FFT  of  the  128  by  128  complex  pixel  chip 
described  above  is  taken,  then  the  transformed  signal  is  shifted  so  that  the  small  fre¬ 
quencies  occur  in  the  center.  Figure  10  shows  the  resulting  2-D  signal  of  the  example 
MSTAR  SAR  chip. 


41 


120 

100 

80 

60 

40 

20 

Figure  10.  Mesh  plot  of  the  magnitude  of  the  2-D  signal  for  the  MSTAR  sample 
chip.  A  grayscale  image  of  the  same  signal  is  shown  on  the  right. 

A  noticeable  band  of  near-zero  values  appears  at  the  border  of  the  2-D  signal 
seen  in  Fig.  10.  This  band  is  assumed  to  be  a  result  of  zero-padding,  and  a  14  pixel 
wide  band  is  removed  from  the  perimeter  of  the  signal,  leaving  the  100  by  100  signal 
shown  in  Fig.  11.  Next,  a  process  is  undertaken  to  remove  the  Taylor  windowing 
implemented  when  the  SAR  data  is  collected.  MSTAR  uses  a  35  dB  Taylor  window 
with  h  =  4.  Figure  11  shows  the  described  2-D  Taylor  window. 


Figure  11.  A  cropped  100  by  100  grayscale  image  of  the  magnitude  of  the  signal 
shown  in  Fig  10.  In  the  middle  is  a  3-D  plot  of  a  Taylor  window  with 
100  coefficients,  a  35  dB  sidelobe  suppression  level,  and  n  =  4,  and  on 
the  right  is  the  cropped  signal  with  the  windowing  removed. 


Finally,  the  cropped  unwindowed  signal  information,  also  called  the  phase  his¬ 
tory,  is  processed  by  a  1-D  FFT  along  the  range  dimension  to  reveal  the  range  profiles 
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shown  in  Fig.  12.  The  magnitude  of  these  range  profiles  and  the  mean  profile  are 
also  plotted.  Features  are  extracted  from  the  1-D  mean  range  profile. 


Figure  12.  Taking  a  1-D  FFT  of  the  cropped  unwindowed  signal  of  Fig.  11  results 
in  the  range  profile  information  shown  in  the  left  plot.  On  the  right, 
the  individual  range  profiles  (columns  of  left  plot)  are  plotted  in  gray 
with  the  mean  profile  shown  in  black.  Features  are  extracted  from  the 
mean  profile. 


2.3  Model  selection 

2.3.1  Introduction 

One  goal  of  the  research  is  to  develop  a  classifier  using  time-series  models 
that  explain  the  change  in  HRR  signatures  of  moving  targets.  Selecting  a  model 
from  among  candidate  models  is  the  subject  of  this  section.  The  choice  should 
not  be  based  solely  on  goodness-of-ht,  but  should  also  consider  model  complexity. 
An  unnecessarily  complex  model  may  overfit  a  given  set  of  data  and  generalize 
poorly.  Model  selection  methods  trade-off  goodness-of-fit  with  model  complexity  in 
the  search  for  the  “best”  model. 

2.3.2  Literature 

A  review  of  the  model  selection  literature  yielded  several  survey  papers  [66, 
67,  68]  that  treat  the  subject  from  a  natural  science  perspective.  The  following 
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subsections  on  the  various  model  selection  techniques  are  derived  from  these  sources 
and  Burnham  and  Anderson’s  text  [69] . 

Several  papers  specifically  address  estimation  of  the  order  of  hidden  Markov 
models.  The  order  of  an  HMM  refers  to  the  number  of  states  in  the  hidden  Markov 
chain  and  is  a  measure  of  model  complexity.  Li  et  al.  [70]  use  the  Bayesian  Informa¬ 
tion  Criterion  (BIC)  to  specify  the  order  of  an  HMM  in  a  handwriting-recognition 
application.  Ryden  [71]  proposes  a  penalized  likelihood  estimator  which  can  be  used 
with  the  BIC  and  the  Akaike  Information  Criterion  (AIC)  to  estimate  the  order  of 
an  HMM.  Ryden  shows  in  a  specified  limit  the  estimator  does  not  underestimate  the 
order. 


2.3.3  Likelihood  criterion 

Three  aspects  determine  inference  from  models  according  to  Fisher  [72]:  (1) 
model  specification,  (2)  estimation  of  model  parameters,  and  (3)  estimation  of  pre¬ 
cision.  The  Fisher  likelihood  theory  assumes  that  the  model  specification  is  correct, 
leaving  only  the  parameters  of  the  model  to  be  estimated.  In  cases  such  as  lin¬ 
ear  regression  models,  the  parameters  can  be  estimated  using  maximum  likelihood 
methods. 

Suppose  that  a  probability  model  g  describes  the  probability  distribution  of 
the  data  x  given  the  model  parameters  9  and  a  model  specification  or  type  g ,  i.e., 

g(x\9,  model). 

Also  suppose  that  data  is  collected  and  a  model  is  specified,  but  the  model  parameters 
are  unknown.  Then  the  likelihood  function  can  be  used  for  parameter  estimation: 

C{0 \x,  model). 
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The  probability  model  and  likelihood  model  differ  by  what  is  known  and  what  is 
sought.  In  the  probability  model,  the  parameters  and  the  model  are  known.  The 
probability  of  a  given  event  (the  data)  is  sought.  In  the  likelihood  model  the  data  and 
the  model  are  given,  and  estimates  of  the  model  parameters  are  sought.  Thus  the 
roles  of  the  data  and  the  parameters  are  reversed  for  the  probability  and  likelihood 
models. 

Burnham  and  Anderson  [69]  use  a  coin-flipping  example  to  illustrate  the  like¬ 
lihood  concept.  The  experiment  flips  a  coin  n  times  and  observes  y  heads.  The  coin 
is  assumed  to  be  unbiased  and  the  coin  flips  are  assumed  to  be  independent.  The 
binomial  model  is  chosen  to  study  the  experiment.  The  likelihood  function  is 

C(p\y,  n,  binomial )  =  py(  1  —  p)n~v , 

where  p,  the  probability  of  a  head,  is  the  parameter  of  interest.  One  might  calculate 
the  likelihood  of  many  values  of  p  and  pick  the  most  likely  one  as  the  best  estimate 
of  p  given  the  model.  This  is  the  maximum  likelihood  estimator  and  is  found  by 
maximizing 


log(£(p|y,  n,  binomial))  =  log 


+  V  ■  log (p)  +  (n 


y)  •  iog(i  -  p) 


given  n  flips  and  y  heads. 

Likelihood  ratio  tests  and  maximum  likelihood  estimation  are  popular  methods 
of  parameter  estimation.  Standard  statistics  texts  such  as  Wackerly  [73],  Mood  [74], 
Hogg  [75],  and  Bickel  [76]  treat  the  subject  thoroughly. 
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2.3.4  Akaike’s  information  criterion 

Kullback  and  Leibler  [77]  introduced  a  “ distance ”  metric  to  compare  two  mod¬ 
els  /  and  g, 

Hf,g)  =  j 

where  I(f,g )  denotes  the  information  lost  when  g  is  used  to  approximate  /.  The 
K-L  distance  is  a  fundamental  quantity  in  information  theory  and  is  the  basis  for 
model  selection  paired  with  likelihood  inference  [69]. 

The  K-L  distance  can  be  rewritten  as 


IV,  9) 


f(x)  log(/(z))dz  -  J  f(x)  log(j(i|#))di, 


and,  recognizing  the  two  terms  above  as  statistical  expectations  with  respect  to  /, 
is 

IV,  9)  =  EJogf/M)]  -  E^og^M#))]. 

Key  to  the  relative  comparison  of  two  models  is  the  assumption  that  /  refers  to  the 
unknown  “true”  distribution  and  g  is  the  approximating  model.  The  “true”  distri¬ 
bution  /,  while  unknown,  remains  constant,  and  Ex[log(/(x))]  can  be  considered  a 
constant  C  when  calculating  a  relative  distance  between  /  and  g: 


=  C -Ex[log(g(x\6))],  or  I(f,g)-C  =  -Ex[log(g(x\9))}. 

Now  (/(/,  g)  —  C )  becomes  the  relative  distance  between  models  /  and  g,  making 
(E^flog^rrld))])  a  measure  of  interest  for  determining  the  best  model.  For  instance, 
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given  two  models  gi  and  ,/2.  if  /(/,</ 1)  <  / (  f.  then  yi  is  best.  Also, 


/(/,St)-C  <  /(/,*) -c 

Ei[log(9i(*l#))]  <  Ea,[log(sr2(a:|9))]  and 

=  -E.llogte^l^Jl  +  EJlogtatild))]. 

Akaike’s  seminal  work  [78]  introduced  a  method  of  model  selection  using  K-L 
distance  without  the  restriction  of  full  knowledge  of  the  “true”  model  /  and  the 
parameters  9.  In  Akaike’s  development  the  unique  value  of  9  that  minimizes  the 
K-L  distance  /(/,  g )  is  unknown.  At  this  value  do  information  loss  is  minimized,  and 
d0  is  found  as  with  the  maximum  likelihood  estimator  d.  Thus  the  model  selection 
process  shifts  from  minimizing  known  (d0)  K-L  distance  to  minimizing  estimated 
K-L  distance  (d)  based  on  the  expected  value  of  the  estimated  parameters,  or 

EyEx  log(g(x\9(y)))  , 

where  x  and  y  are  independent  random  samples  from  the  same  distribution  and 
the  expectations  are  taken  with  respect  to  truth  (/).  Akaike  showed  that  using 
\og(C(9\data)) ,  the  maximized  log-likelihood  for  each  model  g,  given  data,  to  esti¬ 
mate  K-L  distance  results  in  an  upwardly  biased  estimate.  He  also  showed  that  the 
bias  can  be  corrected  by  incorporating  the  number  of  estimable  parameters  K,  which 
can  be  considered  a  measure  of  complexity  and  hence  a  part  of  the  classic  tradeoff 
between  bias  and  variance  as  a  result  of  underfitting  or  overfitting  data.  However, 
Akaike’s  development  finds  if  as  a  simple  expression  of  the  asymptotic  bias  in  the 
log-likelihood  as  an  estimator  of  EyEx  log(g(x\6(y)))  .  Thus,  log(£(9\data))  —  K  is 
an  unbiased  estimator  of  E^E^  \og(g(x\0{y)))  . 

For  K-L  distance  in  the  general  case, 

-  C  = -E*[log(9(z|#))], 


47 


and  since  the  K-L  distance  is  to  be  estimated  using  the  MLE  9,  the  expectation  of 
both  sides  yields 


E„[/(/,sM%))]  -C  =  -E.EJlogteOrlflG,)))]. 

Substituting  the  above  result  for  the  corrected  estimator  gives 

E  „[/(/,  g(x\9(y))}  ~C  =  ~  \og{C{9\data))  +  K, 

which,  with  rearrangement  and  inclusion  of  a  factor  of  2,  yields  the  Akaike  informa¬ 
tion  criterion 

AIC  =  -2  •  log(£(0  data))  +  2 K.  (42) 

2.3.5  Bayesian  information  criterion 

Schwarz  [79]  presents  an  alternative  to  AIC, 

BIC  =  -2  •  log  C{9\data)  +  log(n)  •  K,  (43) 

where  n  is  the  number  of  models  being  considered.  The  difference  between  AIC  and 
BIC  is  the  log(n)  term. 

The  following  derivation  [69]  shows  the  origin  of  the  log(n)  term.  Given  a 
model  gi  the  likelihood  of  parameter  set  0t  (for  K  parameters  9  is  a  vector  of  length 
K )  given  the  data  is  CiO^x,  cjj).  The  prior  probability  for  0t  is  denoted  ,  The 
marginal  likelihood  is 

gfx)  =  j  gi(x\9i)'Ki(0i)d9i: 

or  the  likelihood  of  model  gt  given  the  data  and  the  prior  probability  distribution  of 

et. 
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The  marginal  probability  of  the  data  is 


ir(9)d9, 


which  is 

f[C(9\x,g)]n(9)d9,  (44) 

where  x  represents  the  data.  As  sample  size  increases  the  likelihood  function  near 
the  maximum  likelihood  estimator  9  can  be  approximated  as 


C(6\x,  g)  =  OjS\x,g)  ■ 


where  V(9)  is  the  estimated  K  x  K  variance-covariance  matrix  of  the  MLE.  This 
form  of  the  likelihood  stems  from  the  sampling  distribution  of  the  MLE  becoming 
multivariate  normal  as  the  sample  size  goes  to  infinity.  Substituting  the  estimated 
form  of  the  likelihood  back  into  Eq.  44  yields 

C(9\x,g)  J  e-^~§yv^-1^K(9)d9. 


As  the  sample  size  n  goes  to  infinity,  the  approximation  becomes  exact,  the  likelihood 
concentrates  near  9 ,  and  the  prior  is  effectively  uniform,  so  n(9)  can  be  treated  as 
a  constant.  The  integral  is  directly  related  to  the  underlying  multivariate  normal 
distribution 

J  (27r)-^2||E(d)-1||1/2e-^^vW“1Mdd  =  1, 

where  ||  •  ||  is  the  determinant  of  a  matrix.  Using  the  normalizing  constant  yields 


n^i^) 


u=i 


n(9)d9  «  C(0\x,g)  [(2tt)k/2|| V{9)~1 
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For  a  random  sample,  V{6)  1  =  nV\{9)  1,  where  V\{;)  is  independent  of  sample  size. 
Also,  ||nVi(d)-1||  =  nK'||Vi(0)_1||.  Thus 

/n 

n 9(x,\0)  w(e)M»C0\x,g)  [(2Ir)«2n-«2||V'(#)-1||-1'2'  . 

J=1  J 

Taking  —2  times  the  log  of  the  right  hand  side  yields  the  BIC  criterion 

—2  log(£((9|rr,  <?))  +  ATlog(n)  -  ATlog(27r)  -  log(||Fi((9)_1||). 

The  last  two  terms  are  dropped  because  they  are  dominated  asymptotically  by  the 
order  log(n)  term  and  the  order  n  log-likelihood  term. 

2.3.6  Method  of  cross-validation 

The  objective  of  cross-validation  techniques  is  to  evaluate  model  predictive 
accuracy  [67].  The  standard  arrangement  divides  available  data  into  a  training  set 
and  a  testing  set.  The  training  data  is  used  to  fit  a  model,  resulting  in  a  set  of  model 
parameters,  9cai.  Test  data  is  used  to  measure  the  performance  of  the  calibrated 
model. 

2.f  Classifier  fusion 
2.4.I  Introduction 

One  goal  of  this  research  is  the  application  of  an  architecture-selection  method¬ 
ology  for  the  design  of  a  multiple  classifier  system.  This  section  outlines  basic  con¬ 
cepts  and  taxonomy  associated  with  multiple  classifier  systems. 


50 


2-4-2  Literature 


Multiple  classifier  system  (MCS)  literature  can  be  divided  into  two  groups: 
MCS  theory  and  MCS  application.  Roli  leads  a  continuing  research  effort  in  MCS 
theory;  his  fusion  tutorial  [10]  is  an  excellent  source  for  MCS  concepts  and  taxonomy. 
Roli  and  Giacinto’s  book  chapter  [80]  on  design  considerations  for  MCSs  covers 
tradeoffs  in  design  of  the  classifier  ensemble  and  the  fusing  mechanism. 

Combining  outputs  from  a  set  of  different  classifiers  is  one  method  for  the 
development  of  high  performance  classification  systems.  Roli  and  Giacinto  believe 
that 


the  rationale  behind  the  growing  interest  in  MCSs  is  that  the  classical 
approach  to  designing  a  pattern  recognition  system,  which  focuses  on  the 
search  for  the  best  individual  classifier,  has  some  serious  drawbacks.  The 
main  drawback  is  that  the  best  individual  classifier  for  the  classification 
task  at  hand  is  very  difficult  to  identify,  unless  deep  prior  knowledge  is 
available  for  such  a  task.  [80] 

One  key  concept  in  MCSs  is  that  of  complementary  discriminatory  power  of 
classifiers.  That  is,  the  discriminatory  information  of  one  classifier  may  complement 
another  classifier.  Both  classifiers  make  mistakes,  but  the  mistakes  are  not  identical, 
and  so  the  combination  of  classifiers  according  to  some  rule  will  improve  performance 
over  the  individual  classifiers. 

The  design  of  an  MCS  can  be  split  into  two  parts:  first,  design  of  the  classifier 
ensemble,  and  second,  design  of  the  fusion  function.  The  goal  of  the  first  part  is  to 
create  a  set  of  complementary,  or  diverse,  classifiers.  The  goal  of  the  second  part 
is  to  create  a  mechanism  that  can  exploit  the  complementary-ness  of  the  classifiers 
and  optimally  combine  them.  The  Roli  fusion  tutorial  highlights  several  of  these 
techniques  [10]. 

Methods  used  to  design  the  classifier  ensemble  assume  a  fixed  decision  function 
and  generate  a  set  of  complementary  classifiers  to  achieve  the  best  accuracy  relative 
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Figure  13.  At  the  abstract  level  of  fusion  each  classifier  outputs  a  class  label  for 
each  test  record.  A  typical  fusor  is  the  majority  vote  scheme.  Here  three 
classifiers  assign  class  membership  to  a  test  record  and  the  decision  rule 
chooses  the  final  class  membership. 

to  the  decision  function.  Roli  calls  these  methods  coverage  optimization  methods, 
and  some  examples  are  [10] : 

•  injecting  randomness  into  the  classifier  training  algorithm,  e.g.  neu¬ 
ral  networks  with  different  initializations 

•  manipulating  training  data  by  partitioning  the  data  set  or  creating 
overlapping  data  sets 

•  manipulating  input  features,  using  feature  selection  methods  and 
feeding  different  features  to  different  classifiers 

•  manipulating  output  features,  partitioning  the  set  of  classes  in  dif¬ 
ferent  ways,  then  assign  classifiers  to  work  on  a  subset  of  the  whole 
class  structure 

Design  of  the  combination  function  typically  assumes  a  given  set  classifiers  and 
has  a  goal  of  finding  an  optimal  combination  of  decisions  from  those  classifiers.  Roli 
breaks  down  decision  optimization  into  three  groups:  the  abstract- level  (see  Fig.  13), 
the  rank-level  (see  Fig.  14),  and  the  measurement-level  (see  Fig.  15). 

The  Dasarathy  short  course  on  multi-sensor  fusion  [81]  lists  two  fusion  tax¬ 
onomies:  one  based  on  sensor  ensemble  configuration  or  architecture,  and  one  based 
on  modes  of  input  and  output  of  the  sensor  ensemble.  The  first  refers  to  how  the 
multiple  classifiers  are  connected,  whether  series,  or  parallel,  or  some  combination  of 
the  two.  The  second  taxonomy  covers  much  the  same  ground  as  the  Roli  abstract- 
rank-measurement  levels  of  fusion. 
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Figure  14.  At  the  rank  level  of  fusion  each  classifier  outputs  an  ordered  list  of 
possible  classes  for  each  test  record. 


Data 


Measurement  Fusion  Decision 


Figure  15.  At  the  measurement  level  of  fusion  each  classifier  passes  an  output 
vector  to  the  fusor.  The  fusor  combines  the  multiple  outputs  across 
each  vector  element. 
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For  Dasarathy  the  modes  of  fusion  are  divided  into  data-level,  feature-level, 
and  decision-level.  Further,  Dasarathy  shows  that  sub-classes  within  this  taxonomy 
are  formed  based  on  input  to  and  output  from  the  fuser.  Data  In  -  Data  Out  (DAI- 
DAO)  fusion  occurs  when  data  from  similar  sensors  are  combined  using  arithmetic 
or  logical  operations;  for  instance,  pixel  intensities  in  multi-spectral  image  data. 
An  example  of  Features  In  -  Features  Out  (FEI-FEO)  fusion  is  the  fusion  of  two 
inputs,  one  an  infrared  sensor  measuring  cross-section,  and  the  other  a  range  radar 
measuring  target  depth.  The  fused  output  is  a  volumetric  feature  of  the  target.  The 
most  common  fusion  category  is  Features  In  -  Decision  Out  (FEI-DEO).  Here  the 
recognition  tool  accepts  features,  then  makes  a  classification  decision.  At  the  top 
level  of  fusion  categories  is  Decisions  In  -  Decision  Out  (DEI-DEO)  fusion.  Voting 
schemes  fit  into  this  category. 

Several  recent  applications  of  MCS  in  the  area  of  pattern  recognition  include 
Chan’s  fusion  of  dualband  forward-looking  infrared  (FLIR)  target  data  [82],  Other 
fusion  applications  include:  Rizvi  [83],  which  reports  on  various  fusion  techniques  in  a 
FLIR  ATR  application,  and  Song  [84] ,  which  studies  biomedical  image  identification 
using  fused  contextual  information. 

A  series  of  AFIT  master’s  research  investigated  fusion  methods,  correlation 
effects,  and  performance  metrics  [85,  86,  87,  88].  Storm  [85]  introduced  a  synthetic 
fusion  testing  environment  and  studied  the  effects  of  data  correlation  on  three  fusion 
techniques.  Leap  [86]  extended  Storm’s  work  by  examining  the  effects  of  sample  size 
as  well  as  correlation.  Clemans  [87]  increased  the  number  of  classifiers  in  the  ensemble 
to  three  from  two  and  searched  for  the  optimal  ensemble  given  various  experiment 
settings.  Mindrup  [88]  extended  Leap’s  work  by  allowing  a  non-declaration  option 
from  his  classifiers,  applying  a  cost  function  and  finding  the  optimal  fusion  method. 

A  rejection,  or  non-declaration,  parameter  defines  a  region  of  class  ambiguity 
where  a  classifier  labels  test  records  “unknown”  [89] .  A  Bayes-optimal  decision  rule 
which  assigns  test  records  to  the  class  with  the  maximum  a  posteriori  probability 
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may  be  used.  Rejection  improves  classification  accuracy  while  decreasing  misclassi- 
fication  errors  by  allowing  the  classifier  to  label  “unknown”  difficult-to-identify  test 
records  [89]. 

Using  a  rejection  option  creates  a  tradeoff  between  improved  classification  per¬ 
formance  and  the  cost  of  gathering  more  information  if  a  non-declaration  is  made. 
Chow  [89]  finds  that  the  optimal  rejection  threshold  given  costs  for  misclassifica- 
tion,  rejection,  and  correct  classification  are  equivalent  across  data  classes.  Fumera 
et  al.  [11]  apply  class-specific  rejection  thresholds  to  account  for  varying  class  prior 
probabilities.  Fumera  proves  that  using  multiple  thresholds  achieves  equal  or  better 
classification  performance  than  using  the  single  rejection  threshold  of  Chow. 

Several  authors  use  a  loss  function  to  set  classification  and  rejection  rules  in  a 
Bayes-optimal  classification  strategy  (Chow  [89],  Devijver  and  Kittler  [90],  Fumera 
et  al.  [11]  and  Haspert  [91]).  By  minimizing  a  loss  function,  classifier  performance  is 
optimized  given  set  costs  of  rejection,  classification  errors,  and  correct  classification 
in  equivalent  units.  Setting  the  relative  costs  for  classification  error  and  rejection 
places  the  warfighter  in  the  position  of  formally  setting  the  cost  of  a  fratricide  incident 
versus  non-declaration  versus  correct  identification;  a  position  the  warfighter  may  not 
desire  [4], 

Laine’s  AFIT  PhD  research  [15]  presents  a  CID  framework  with  a  reject  op¬ 
tion  that  optimizes  classification  performance  without  resorting  to  a  cost-based  loss 
function. 

2. 5  Summary 

This  chapter  presented  relevant  background  for  the  investigation  of  an  HMM- 
based  MCS  in  a  CID  application,  and  HMM  theory  and  application  were  described. 
Also,  HRR  signature  processing  and  use  in  classification  were  covered,  model  selec- 
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tion  theory  was  reviewed  with  specific  attention  to  HMMs,  and  the  fusion  of  multiple 
classifiers  and  rejection  theory  were  reviewed. 
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3.  HMM  Classifier  Development 

3. 1  Introduction 

This  chapter  considers  the  development  of  HMM  classifiers  operating  on  HRR 
feature  data.  An  introductory  section  illustrates  the  application  of  a  simple  HMM- 
based  classifier  to  sequences  of  genetic  data.  This  example  supports  the  theory  of 
the  previous  chapter. 

Additionally,  an  implementation  of  model  selection  based  on  the  theory  pre¬ 
sented  in  Chapter  2  using  HMMs  and  synthetic  data  is  presented. 

3.2  Introductory  HMM  Classifier 

This  section  presents  an  example  of  a  discrete  hidden  Markov  model  used  as 
a  data  sequence  classifier.  The  classification  of  four- state  genetic  codes  of  varying 
lengths  from  two  genetic  groups,  human  and  mouse,  is  investigated  using  discrete 
HMMs  employing  different  numbers  of  hidden  states.  The  impact  on  classification 
accuracy  across  numbers  of  hidden  states  is  explored  using  test  and  validation  data 
sets  [92], 

In  this  application  the  complexity  of  HMMs  is  explored  using  the  number  of 
hidden  states.  The  hidden  state  space  and  observation  state  space  are  assumed  to 
be  fully-connected.  Hence,  the  Markov  chain  may  transition  from  any  state  to  any 
state  with  probability  greater  than  zero,  and  each  state  may  produce  any  symbol 
from  the  discrete  observation  alphabet  with  probability  greater  than  zero. 

The  hidden  state  transition  matrix  A  and  the  observation  distribution  matrix 
B  are  initialized  randomly  before  re-estimation  using  the  Baum- Welch  algorithm 
given  training  sequences.  The  implementation  of  the  algorithm  in  MATLAB®  uses 
code  from  Murphy’s  HMM  toolkit  [93]. 
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3. 2. 1  Methodology 


The  initial  concept  of  the  project  was  to  employ  discrete  HMMs  to  classify 
genetic  sequences  into  one  of  two  classes.  Through  research  on  biological  sequencing 
in  the  Durbin’s  text  [94],  and  online  at  the  University  of  California  at  Santa  Cruz’s 
(UCSC)  Computational  Biology  website  [95],  a  satisfactory  set  of  data  was  found. 

The  classification  data  consists  of  77  randomly  selected  pairs  of  aligned  genetic 
sequences  of  DNA  from  chromosome  10  of  the  mouse  and  human  species.  This 
data  suits  the  purpose  of  this  effort  for  a  number  of  reasons.  First,  the  genetic 
data  describes  two  mammals  and  comes  from  identical  locations  on  chromosome  10 
of  the  respective  DNA.  Therefore,  differences  in  the  sequence  data  are  a  function 
of  the  species  and  not  of  the  genetic  location  (either  within  the  chromosome  or 
across  chromosomes).  Second,  the  task  of  aligning  the  sequences  has  already  been 
accomplished;  roughly,  this  means  each  sequence  is  of  the  same  length.  Any  disparity 
in  length  between  the  human  and  mouse  pair  is  made  up  with  space  holders  (later 
removed)  such  that  sub-sequences  within  the  larger  sequences  are  aligned  at  mutually 
shared  locations.  Third,  the  data  is  naturally  presented  as  a  classification  data  set 
with  one  record  for  a  human  sequence  and  another  for  its  aligned  mouse  partner. 

A  series  of  transforms  produces  a  set  of  useable  inputs  for  the  MATLAB® 
HMM  functions.  The  original  data  as  downloaded  from  the  UCSC’s  website  consists 
of  77  paired  sequences  of  human  and  mouse  DNA  in  a  flat  text  file;  example  sequences 
are  shown  in  Fig.  16.  The  desired  input  to  the  MATLAB®-based  HMM  functions 
is  a  set  of  vectors  representing  the  human  sequence  records  and  a  set  of  vectors 
representing  the  mouse  sequence  records.  To  reach  this  goal,  the  data  is  separated 
into  human  and  mouse  data  files,  converted  from  text  to  integers,  and  converted  into 
sub-sequences  based  on  the  location  of  the  space-holding  characters.  There  are  687 
sub-sequences  for  each  class  of  data. 
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1  c  hr 1 0_r  andora  877  890  chrX  100492763  100492776  -  825 
TTTGTTTTTTAGTA 


2  chr 10_random  891  1054  chrX  119435191  119435335  -  7467 

T AA AG AGT AGT ATTTT ATTG A AT A AG ATTTGC  TC  AC  AG A A A A AT A AGC  TT A A ATC  TGC AATGAATGC C AGAC TC TAC AGC AG A A AGC  AATTTTC TC AC TTTTC C AC  A 
T A A AG A AT AGTGC  TTT ATTG A AT A AGTC  TTATTC AC  AG A A A A AT A AGC  TTT A ATC TAC A AC  AAATGAC AG ATT AT AG AGC  AG A A AGC  AATTC - CA 

3  chr 10_random  19625  19788  chrX  27726436  27726580  +  7589 

GT  AGT  AC  ATTTGTG A ATG AA ATTTT ATGGC  TTTTTTC  AC  TT AGT AGGA AC  C  ATTGTGTGTGG A A A AGTG AG A A A ATTGC  TTTC  TGC  TGT AG AGTC  TGGC  ATTC  ATTG 
GT  AC  TAT  ATTTGTG  A  ATAT  A  ATTTT  ATG  AC  TC  GTTT - ATGG  A  A  AC  TC  C  TTT  ATG - GA  ATTGC  TTTC  TGC  TC  TAT  A  ATC  TGTC  ATTTGTTG 

4  chrlO_random  19789  19802  chrX  46668995  46669008  +  825 
TACTAAAAAACAAA 

TGTTAAAAAAAAAA 

5  c  hr 1 0_r  andom  3  6424  37047  chrl3  111196092  111196743  +  28574 

TC  AGTC  TTTGTG —  TAT A AAGAC  ATC  C  AT AGC  AGGC  TTT- ATC C AGC C AGC TTC TTTGGG ATTC TTT AT ATGGTTTC AGGTC  T AT AGC AT ATC C AC T A A AAT ATTC C 
TC  G ATC  TC  TG AG AC  T A AAA AGGC  ATC  C  AC  AGC  AGGC  TTTT ATC  C  AGC  C  AGC  TTC  TTTG AG AC  TC  TTC  ATGGGTTTTG AGGTC  TAC  A A AGT AT AC  AC  T A AT AT AC  CCA 

6  c  hr 1 0_r  andom  37048  37584  chrl3  111221047  111221602  +  25447 

TGT  ATGAGG ATG ATGTC  C  AGGATGTTGGTC  TGGTGTCC  C  TG AG AC  AGC  AC  T A AC  AGGTC  C  GTGGC  TGGGTC  C  AGGTC  C  TTC  CTGGAC  GGATTGGC  AAGG AGC  TC  AC  T 
TG A AC  ATGG ATG ATGTC  TGGG A AGTTGGC  C  AGGTGC  C  C  C  TG AT AC  AGTGC  T A AC  AGGTC  C ATG AC  TGGGTC C AGGTC C TGC C  TGGGC TGC TC AGC  G A AG AGTTC GC C 

7  chrlO_random  38489  38544  chrl3  6093438  6093493  -  3062 

C  C  A AG A AGT AG AC  C  AC  AGGC  C  GTC TTG AGG AGG AC TTT ATGTTC  A AGTGC AG A A AG 
C  TG AG AAGC  AGGC  C  AC  C  GGGC  GC  TTC  G A AG AGG AC  TTC  ATC  TC  C  A A AC  GC  AGGA AG 


Figure  16.  Example  gene  sequences  from  chromosome  10  of  human  and  mouse 
DNA. 


The  goal  of  the  effort  is  to  use  HMMs,  trained  with  sequences  of  known  origin 
(human/mouse),  to  classify  “unknown”  sequences  into  either  the  human  or  mouse 
species.  To  accomplish  this  goal  two  HMMs  are  trained.  One  model  is  trained  using 
human  sequences  and  one  model  is  trained  using  mouse  sequences.  The  classification 
of  a  particular  sequence  results  from  a  comparison  of  model  likelihoods.  Given  a  test 
sequence,  if  the  human  model  is  more  likely  than  the  mouse  model  to  have  produced 
the  sequence,  then  the  sequence  is  classified  as  human,  with  the  converse  true  for  a 
mouse  sequence. 

While  classifying  unknown  sequences  is  the  goal,  insight  into  the  relationship 
between  model  complexity  and  model  performance  is  also  sought.  A  series  of  exper¬ 
iments  is  devised  to  explore  this  relationship.  Two  HMMs  with  n  hidden  states  are 
trained  using  400  randomly-selected  class- specific  data  records.  For  the  experiment 
performed  here,  n  —  [2  3  4  5  6  7  8  9  10  15  20  30].  Each  HMM  is 

tested  using  100  randomly  selected  records  from  each  class  of  data.  Class  member¬ 
ship  is  determined  by  comparing  the  likelihoods  produced  by  the  two  class-specific 
HMM  classifiers  when  presented  with  a  test  record. 
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Figure  17.  Plot  of  probability  of  correct  selection  versus  number  of  hidden  states. 

3.2.2  Results 

The  experimental  results  (see  Fig.  17)  show  that  maximum  classification  ac¬ 
curacy  is  achieved  with  three  hidden  states  in  the  models.  Performance  drops  off 
rapidly  as  hidden  states  are  added,  indicating  that  a  simple  model  (i.e.,  three  hidden 
states)  is  preferable  to  a  more  complex  model. 

This  experiment  shows  that  HMMs  form  a  useful  tool  for  classifying  sequenced 
discrete- valued  data. 

3.3  Model  Selection  with  HMMs 

This  section  investigates  model  complexity  in  HMMs  using  MATLAB® .  First, 
discrete  HMMs  operating  on  discrete  data  are  considered.  Then,  a  comparison  of 
discrete  and  continuous  HMMs  operating  on  continuous  data  is  performed.  Finally, 
a  multi- variate  Gaussian  HMM  is  used  to  classify  sequenced  multi-dimensional  data. 

3.3.1  Complexity  in  Discrete  HMMs 

Discrete  hidden  Markov  models  have  both  a  discrete  state  space  and  a  discrete 
observation  space.  The  following  experiment  examines  measures  of  complexity  in  a 
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discrete  HMM-based  classifier.  The  experiment  applies  various  measures  of  complex¬ 
ity  to  identify  the  most  appropriate  model  given  an  ensemble  of  potential  models. 
Data  is  generated  from  a  stochastic  model  of  known  complexity.  The  data  is  then 
used  to  train  and  test  a  discrete  HMM-based  classifier.  Based  on  classifier  output, 
a  model  selection  technique  is  applied.  Then  a  comparison  is  made  between  the 
controlled  user-defined  data  complexity  and  the  suggested  model  complexity  based 
on  model  outputs. 

The  experiment  includes  the  following  steps: 

•  Choose  experiment  parameters.  Here  the  parameters  of  the  stochastic  model 
used  to  generate  the  data  are  specified. 

•  Generate  training  and  testing  data  based  on  the  parameter  set 

•  Generate  initial  discrete  HMMs  of  varying  complexity 

•  Train  the  HMMs  using  a  training  data  set 

•  Test  the  HMMs  using  a  testing  data  set 

•  Reduce  the  HMM  state  space  by  one  state  and  repeat  the  training/testing 
sequence 

The  output  of  an  experiment  is  a  mean  log-likelihood  achieved  by  averaging  the 
log-likelihoods  produced  by  the  trained  HMMs  for  each  testing  record.  Thus  as  the 
number  of  testing  records  increases,  the  better  the  estimator  (mean  log- likelihood) 
for  model  performance. 

Figure  18  shows  the  processes  of  the  complexity  experiment.  The  experiment 
begins  with  a  highly-complex  discrete  HMM  (20  hidden  states)  and  the  complexity 
is  iteratively  reduced  by  one  state.  The  state  to  be  removed  is  chosen  based  on  its 
relative  probability  of  in-transitioning.  This  choice  of  rule  is  arbitrary.  The  decision 
is  made  by  summing  over  each  column  of  the  hidden  state  transition  probability 
matrix.  The  column  sums  are  compared  and  the  state  associated  with  the  min  value 
is  chosen  as  the  state  to  be  removed. 
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Figure  18.  HMM  model  complexity  experiment  set-up.  Experimental  parameters, 
key  functions  with  input/outputs  listed,  and  looping  constructs  are 
shown. 

Data  generation  follows  a  die-rolling  paradigm.  A  sequence  of  die  rolls  produces 
a  series  of  discrete  observations.  The  stochastic  data  generation  process  uses  3  dice, 
each  with  5-sides.  A  transition  matrix  of  a  Markov  chain  defines  the  probability  of 
using  a  specific  die  with  each  die  roll.  An  observation  probability  matrix  defines 
the  die  bias.  Two  classes  of  data  are  generated.  Each  class  has  a  different  Markov 
chain  transition  matrix  but  uses  the  same  observation  distribution  matrix.  The  data 
generation  process  forms  training  and  testing  sequences  by  choosing  a  die,  rolling 
it,  recording  the  result,  and  repeating  the  process  to  a  user-defined  sequence  length 
termination.  Experiment  and  data  generation  settings  are  shown  in  Table  5. 

Training  data  are  used  to  train  discrete  HMMs  of  varying  complexity  (i.e., 
number  of  hidden  states) .  Trained  HMMs  are  given  test  sequences  from  both  classes 
of  data.  The  class-specific  discrete  HMMs  produce  log-likelihoods  when  given  test 
sequences.  These  likelihoods  are  compared  and  class  assignment  is  made  according 
to  the  most  likely  model. 

Figures  19  and  20  show  results  from  the  discrete  HMM  complexity  experiment. 
A  marked  jump  in  classification  accuracy  occurs  when  discrete  HMM  classifiers  of 
order  3  (3  hidden  states)  are  used.  Classification  performance  remains  relatively  con- 
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Table  5.  Experimental  settings  for  two-class  complexity  experiment  using  discrete 
HMMs 


parameter 

class  1 

class  2 

Markov  chain 

transition  matrix 

r  .2  .7  .1  -| 

.1  .3  .6 

L  .4  .1  .5  J 

r  .1  .1  .8  -| 

.7  .1  .2 

L  .1  .6  .3J 

observation  matrix 

r  .6  .1  .05  .2  .05 1 
.05  .6  .1  .2  .05 

L  .05  .1  .5  .15  .2  J 

training  records 

100 

training  seq  length 

30 

test  records 

1000 

test  seq  length 

[5  10  15  20  25  30  ] 

replications 

5 

stant  as  model  complexity  increases.  The  Akaike  and  Bayesian  information  criterion 
(AIC  and  BIC)  concur  on  the  appropriate  model  complexity  (minimum  at  3  hidden 
states). 

One  measure  of  classifier  performance  is  the  Receiver  Operating  Characteristic 
(ROC)  curve.  ROC  curves  have  been  applied  to  many  dichotomous  decision  prob¬ 
lems  [96].  Alsing  [97]  reviews  ROC  curve  analysis  in  automatic  target  recognition 
research.  A  ROC  curve  is  used  to  estimate  classifier  performance  given  test  data. 
Typically,  a  ROC  curve  shows  the  range  of  false-positive/true-positive  coordinates 
generated  by  varying  a  decision  threshold  from  conservative  to  aggressive  values. 
A  conservative  setting  minimizes  the  number  of  false-positives  (or  false  alarms)  at 
the  cost  of  reduced  true-positive  performance.  An  aggressive  setting  maximizes  the 
true-positive  performance  at  the  cost  of  increased  false-positives. 

Figure  20  plots  several  ROC  curves  for  the  discrete  HMM  classifier  given  dif¬ 
ferent  sequence  lengths.  A  longer  sequence  length  means  that  the  classifier  has 
more  observation  data  to  consider  before  classification  is  made.  The  HMM  classifier 
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Figure  19.  Discrete  HMM  complexity  experiment  results.  On  the  left,  classifica¬ 
tion  accuracy  results  by  sequence  length  across  model  complexity.  On 
the  right,  AIC  and  BIC  measures  for  model  selection. 

which  produces  the  ROC  curves  shown  in  Fig.  20  is  of  order  3  (i.e.,  “best”  model 
complexity) . 

The  right-hand  plot  of  Fig.  20  shows  the  distribution  of  observations  in  the 
training  data  for  each  class  of  data.  A  statistical  classifier  that  did  not  account  for 
the  order  of  observation  would  have  a  difficult  time  distinguishing  between  the  two 
classes. 


3.3.2  Complexity  in  Continuous  HMMs 

Continuous  hidden  Markov  models  have  a  discrete  state  space  and  a  continuous 
observation  space.  The  following  experiment  examines  measures  of  complexity  in  a 
Gaussian  HMM-based  classifier,  i.e.,  the  observation  space  related  to  each  hidden 
state  is  distributed  Gaussian.  The  experiment  applies  various  measures  of  complexity 
to  identify  the  most  appropriate  model  given  an  ensemble  of  potential  models.  Data 
is  generated  from  a  stochastic  model  of  known  complexity.  The  data  is  then  used  to 
train  and  test  a  Gaussian  HMM-based  classifier.  Based  on  classifier  output,  a  model 
selection  technique  is  applied.  Then  a  comparison  is  made  between  the  controlled, 
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Figure  20.  ROC  curves  for  various  sequence  length  settings.  Discrete  probability 
distribution  for  the  2  class  problem. 

user-defined  data  complexity  and  the  suggested  model  complexity  based  on  model 
outputs. 

The  experiment  includes  the  following  steps: 

•  Choose  experiment  parameters.  Here  the  parameters  of  the  stochastic  model 
used  to  generate  the  data  are  specified. 

•  Generate  training  and  testing  data  based  on  the  parameter  set 

•  Generate  initial  Gaussian  HMMs  of  varying  complexity 

•  Train  the  HMMs  using  a  training  data  set 

•  Test  the  HMMs  using  a  testing  data  set 

•  Reduce  the  HMM  state  space  by  one  state  and  repeat  the  training/testing 
sequence 

The  output  of  an  experiment  is  a  mean  log-likelihood  achieved  by  averaging  the  log- 
likelihoods  produced  by  the  trained  HMMs  given  each  testing  record.  Thus  as  the 
number  of  testing  records  increases,  the  better  the  estimator  (mean  log- likelihood) 
for  model  performance. 
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Table  6.  Experimental  settings  for  two-class  complexity  experiment  using  contin¬ 
uous  HMMs 


parameter 

class  1  class  2 

Markov  chain  transition  matrix 

Gaussian  observation  matrix 

r  .2  .7  .1  "I  r  .3  .7  0  "I 

.1  .3  .6  .4  .0  .6 

L  .4  .1  .5  J  L  .3  .4  .3  J 

[1  2  5  1  1"  1  2  5  1 

L  1  2  .5  J  L  1  2  .5  J 

training  records 

100 

training  seq  length 

40 

test  records 

1000 

test  seq  length 

[5  10  15  20  25  30  ] 

discretization 

[5  10  30  ] 

replications 

5 

Data  generation  uses  a  Markov  chain  of  3  states  to  determine  the  observation 
distribution  from  which  the  next  observation  is  drawn.  Table  6  shows  the  exper¬ 
imental  settings.  For  example,  if  the  Markov  chain  is  in  state  1,  the  observation 
is  randomly  drawn  from  a  Gaussian  distribution  with  mean  =  1  and  variance  =  1. 
Two  classes  of  data  are  generated.  Each  class  has  a  different  Markov  chain  transition 
matrix  but  uses  the  same  observation  distribution  matrix. 

Training  data  are  used  to  train  Gaussian  HMMs  of  varying  complexity  (i.e. 
number  of  hidden  states) .  Trained  HMMs  are  given  test  sequences  from  both  classes 
of  data.  The  class-specific  HMMs  produce  log-likelihoods  when  given  test  sequences. 
These  likelihoods  are  compared  and  class  assignment  is  made  according  to  the  most 
likely  model. 

Discrete  HMMs  are  trained  using  the  same  continuous  data  after  quantizing 
using  a  /.■-means  clustering  algorithm.  To  compare  performance  of  discrete  versus 
Gaussian  HMMs,  several  quantization  levels  are  used  ( k  =  5,  10,  and  30).  When 
k  =  5  the  discrete  observation  space  has  5  symbols.  Figure  21  shows  classification 
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Figure  21.  Classification  performance  of  the  Gaussian  and  discrete  HMM  classifiers 
at  different  sequence  length  settings.  The  discrete  HMM  classifier  is 
broken  down  into  three  quantization  levels:  5,  10,  and  30. 

accuracy  as  a  function  of  model  complexity  at  a  series  of  sequence  length  settings. 
Notice  the  improved  performance  of  the  Gaussian  HMM  over  the  discrete  HMM  and 
the  marked  peak  at  HMMs  of  order  3. 

Figures  22  and  23  show  results  from  the  Gaussian  HMM  complexity  exper¬ 
iment.  A  marked  jump  in  classification  accuracy  occurs  with  HMM  classifiers  of 
order  3  (3  hidden  states).  Classification  performance  decreases  as  model  complexity 
increases.  The  Akaike  and  Bayesian  information  criterion  (AIC  and  BIC)  concur  on 
the  appropriate  model  complexity  (minimum  at  3  hidden  states). 

Figure  23  plots  several  ROC  curves  for  the  Gaussian  HMM  classifier  given  dif¬ 
ferent  sequence  lengths.  A  longer  sequence  length  means  that  the  classifier  has  more 
observation  data  to  consider  before  classification.  The  HMM  classifier  which  pro¬ 
duced  the  ROC  curves  shown  in  Fig.  23  is  of  order  3  (i.e. ,  “best”  model  complexity). 
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Figure  22.  Continuous  HMM  complexity  experimental  results.  On  the  left,  classi¬ 
fication  accuracy  results  by  sequence  length  across  model  complexity. 
On  the  right,  AIC  and  BIC  measures  for  model  selection. 

The  right-hand  plot  of  Fig.  23  shows  the  distribution  of  observations  in  the 
training  data  for  each  class  of  data.  A  statistical  classifier  that  did  not  account  for 
the  order  of  observation  would  have  a  difficult  time  distinguishing  between  the  two 
classes. 


3.3.3  Multi- dimensional  Gaussian  Data 

This  section  uses  synthetic,  multi- variate  Gaussian  data  to  show  the  utility  of 
Gaussian  HMMs  in  classifying  sequenced,  multi-variate  data. 

Table  7  lists  the  experimental  parameters.  Data  is  generated  in  a  controlled 
manner  using  a  specified  number  of  states  (5).  Data  in  each  state  is  generated  from 
3-dimensional  random  normal  distributions.  The  mean  and  variance  of  each  normal 
distribution  depends  on  the  state  and  data  class. 

Gaussian  HMMs  with  5  hidden  states  are  trained  using  a  sequence  of  samples 
(10)  from  each  3-dimensional  Gaussian  observation  state.  Thus  each  observation 
sequence  is  of  length  50.  Within  an  observation  sequence  the  data  is  ordered  by  the 
distribution  state.  For  example,  the  first  10  observations  in  the  sequence  are  from 
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Figure  23. 


ROC  curves  for  various  sequence  length  settings.  Continuous  probabil¬ 
ity  densities  for  the  2  class  problem. 


Table  7.  Experimental  settings  for  two-class  complexity  experiment  using  multi¬ 
variate  Gaussian  HMMs 


parameter 

class  1 

class  2 

data  mean  by  state 

r0  1.5  3  4.5  6  i 

0  0  0  0  0 

L0  0  0  0  0J 

r  .1  1.6  3.1  4.6  6.1  1 
.1  .1  .1  .1  .1 

L.l  .1  .1  .1  .1  J 

data  std  dev  by  state 

[.4  .4  .4  .4  .4] 

[.4  .4  .4  .4  .4] 

samples  per  state 

10 

training  records 

10 

test  records 

100 

mean  separation  settings 

[  .1  .12 

.15  .2  .25  .3] 

replications 

5 
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Figure  24.  Two  classes  of  data  shown  at  six  different  mean  separation  settings. 

state  1,  the  next  10  are  from  state  2,  etc.  In  this  fashion  an  ordering  is  forced  on 
the  data  generation  process.  A  sample  of  the  two-class  data  is  shown  in  Fig.  24. 

Separate  training  and  testing  data  are  generated  for  each  of  the  two  data 
classes.  The  data  classes  are  defined  by  the  mean  and  standard  deviation  of  their 
respective  multi- variate  Gaussian  observation  distributions.  The  experiment  exam¬ 
ines  the  ability  of  the  Gaussian  HMMs  to  distinguish  between  the  two  classes  while 
the  separation  between  the  means  is  decreased  from  0.3  to  0.1. 

Figure  25  shows  classifier  performance  using  ROC  curves  at  each  of  the  mean 
separation  settings.  At  the  closest  setting  (0.1),  the  classifier  narrowly  outperforms 
the  chance  line  (diagonal  line) .  Perfect  classification  occurs  with  mean  separation  of 
0.3.  The  results  of  this  experiment  point  the  way  to  implementation  of  a  multi- variate 
Gaussian  HMM  in  the  application  of  CID  using  features  from  HRR  signatures. 


70 


0. 

0. 

0. 

o 

I  0. 
<75 

£  o. 

<u 

o. 

0. 

0. 


Figure  25.  ROC  curves  at  six  different  mean  separation  settings. 

3.4  Development  of  HMM-based  CID  System 

One  goal  of  the  research  is  the  development  of  a  time-series  classifier  operating 
on  sequenced  observations  of  targets.  The  hidden  Markov  model  is  the  time-series 
classifier.  In  the  following  subsections  data,  design,  and  fusion  decisions  related  to 
the  implementation  of  an  HMM-based  classifier  are  described. 

Figure  26  provides  an  experiment  flowchart  which  shows  the  major  processes 
used  in  the  development  of  the  HMM-based  classifier.  Feature  data  is  derived  from 
MSTAR  SAR  chips,  and  HMM  classifiers  are  trained  with  class-specific  data  from 
a  segregated  training  data  set.  The  trained  HMMs  process  test  data,  and  a  fuser 
combines  the  output  from  the  class-specific  HMMs  and  assigns  class  membership. 

The  following  sections  describe  design  options  in  the  these  areas:  HRR-derived 
features,  HMM  state  space  structure,  HMM  observation  distributions,  HMM  train¬ 
ing,  HMM  testing,  and  fusion  rule. 
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Figure  26.  Set-up  for  a  standard  HMM  experiment  using  features  derived  from 
MSTAR  SAR  chips  and  employing  a  fusion  rule  to  combine  outputs 
from  multiple  HMM  classifiers. 


3. 4-1  Data  and  Features 


A  time-series  classifier  using  data  derived  from  targets  imaged  at  a  series  of 
aspect  angles  may  leverage  aspect-dependent  information  in  the  effort  to  distinguish 
targets. 

The  classifier  development  described  here  uses  data  from  the  MSTAR  pro¬ 
gram  publicly-available  data  set  [65],  a  subset  of  Collection  1  taken  Sep  1995  at  the 
Redstone  Arsenal,  Huntsville,  AL  by  the  Sandia  National  Laboratory  STARLOS 
sensor  (airborne),  operating  at  X-band  in  one  foot  resolution  spotlight  mode.  Three 
types  of  ground  targets  are  in  the  target  set:  T-72  main  battle  tank,  and  BMP-2 
and  BTR-70  armored  personnel  carriers.  Figure  27  shows  photographs  and  example 
SAR  images  of  the  ground  targets  in  the  data  collection. 

The  data  is  divided  into  two  sets.  The  first  is  used  for  training  and  is  collected 
at  a  sensor-to-target  depression  angle  of  17  degrees.  The  training  set  holds  approxi- 
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Figure  27.  Target  images  and  SAR  chips  of  T-72,  BTR-70,  and  BMP-2  vehicles. 


mately  230  SAR  chips  of  each  target  type  with  each  chip  representing  a  SAR  image 
taken  at  a  specific  target  aspect  angle  relative  to  the  airborne  sensor  bore  sight.  The 
test  set  is  collected  at  a  15  degree  depression  angle,  presenting  a  signature  different 
than  the  training  set  [98].  Approximately  195  SAR  chips  of  each  target  type  are  in 
the  test  set. 

The  SAR  chips  are  processed  according  to  the  steps  of  Section  2.2.6.  A  mean 
HRR  signature  is  produced  from  each  SAR  chip.  This  signature,  also  called  a  pro¬ 
file,  is  a  vector  of  length  162.  Ordering  the  profiles  by  the  sensor-target  aspect  angle 
creates  a  target  HRR  signature  with  respect  to  relative  target  azimuth.  There  are 
holes  in  the  aspect  data;  approximately  230  chips  cover  360  degrees  of  target  az¬ 
imuth.  Figure  28  displays  the  available  HRR  profiles  of  each  target  (first  column), 
the  available  profiles  with  missing  data  (second  column),  and  interpolated  profiles 
(third  column).  Profile  data  is  linearly  interpolated  to  1  degree  resolution  using  the 
available  data,  thus  filling  in  the  missing  data  and  achieving  a  uniform  spacing  of 
observation  data  across  target  type. 
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Figure  28.  Available  and  interpolated  HRR  profiles  for  three  MSTAR 
target  types.  The  training  data  have  a  17°  depression  angle. 


Several  methods  are  used  to  reduce  the  dimensionality  of  the  162-element  HRR 
profile  into  manageable  features.  The  first  method  is  a  simple  maximum  value  rule 
applied  to  a  series  of  adjacent  bin  ranges  of  the  HRR  profile.  Each  SAR  chip  is 
preprocessed  (by  the  Sensors  Directorate  of  AFRL)  to  place  the  target  in  the  center 
of  the  chip.  Figure  28  shows  that  most  of  the  variability  in  the  HRR  signatures  is 
confined  to  the  middle  region  of  the  162-element  profiles.  Further  inspection  showed 
that  peaks  of  significant  magnitude  in  HRR  signatures  for  the  three  targets  are 
located  between  range  bins  62  and  100  of  the  162-element  profile.  This  range  is 
divided  into  7  range  windows;  62-67,  68-73,  74-78,  79-83,  84-88,  89-94,  and  95-100. 
The  feature  vector  x  is  determined  by  the  maximum  values  of  the  HRR  profile  within 
each  of  the  seven  range  windows: 


x 


(max)  _ 


=  argmaxp  [mt]  for  i  —  1,2, ...  ,7 


(45) 


where  p  is  the  HRR  profile  and  Wi  is  the  ith  range  window  of  the  HRR  profile.  A 
mean  value  feature  rule  is  also  used  and  is  designated  x(mean) . 

The  next  feature  set  applies  a  discrete  Fourier  transform  to  the  HRR  profile: 

4fft)  =  for  *  =  1>  2, . . . ,  6,  (46) 

3  = 1 

where  lon  =  e  (~2m)/N  and  N  =  162.  Most  information  is  captured  in  the  low 
frequencies  of  the  transform,  thus  the  feature  vector  retains  only  the  first  six  values. 

Another  feature  set  is  formed  using  principal  component  analysis  (PCA).  A 
translation  of  the  middle  portion  of  the  HRR  profiles  (bins  62-100)  across  360  degrees 
of  aspect  angle  via  principal  component  analysis  reduces  dimensionality  from  39  to 
10  when  the  first  10  principal  components  are  retained.  The  component  scores  form 
the  new  feature  space. 
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Figure  29.  Full  HRR  profile  (interpolated)  and  related  feature  sets  for  the 
BTR-70  target.  Training  data  for  a  17°  depression  angle  are 
shown. 

Given  a  360  by  39  matrix,  where  each  row  is  the  middle  portion  of  a  full 
HRR  profile  at  a  given  aspect  angle,  let  p  be  the  mean-corrected  matrix  such  that 
Pij  =  Pij  ~  I1-]  where  /l,  =  (1/360)  YliPij-  Proceeding  down  the  columns  increases 
the  aspect  angle  at  which  the  HRR  data  are  collected.  Let  C,  a  39  by  39  matrix, 
be  the  normalized,  sample  variance-covariance  matrix  of  p.  Let  A  be  the  matrix  of 
the  eigenvectors  associated  with  the  ten  largest  eigenvalues  of  C.  The  component 
scores  which  form  the  new  feature  space  result  from  multiplying  the  mean-corrected 
matrix  p  by  A: 

x(Pca)=pA  (47) 

Figure  29  shows  the  HRR-derived  feature  sets  for  the  BTR-70  target.  The 
full  profiles  p  are  interpolated  to  1  degree  resolution  in  aspect.  The  feature  sets 
shown  are:  x(max);  the  maximum  value  within  7  range  windows;  a^®),  the  first  6 
frequencies  of  the  discrete  Fourier  transform;  and  .x -l)Cal ,  the  component  scores  after 
translation  using  the  first  10  principal  components. 
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3-4-2  HMM  Topology 


A  hidden  Markov  model  A  is  parameterized  by  the  hidden  Markov  chain  transi¬ 
tion  matrix  A,  an  observation  distribution  matrix  B,  and  an  initial  state  probability 
vector  7 r.  The  design  of  an  HMM  involves  several  decisions  regarding  topology.  Of 
critical  importance  is  the  number  of  hidden  states  in  the  Markov  chain,  called  the 
order  of  the  HMM.  Given  S  states,  the  transition  matrix  is  S  x  S .  Thus,  the  number 
of  parameters  in  the  HMM  increases  non-linearly  with  S. 

The  state  space  may  be  fully-connected,  in  which  case  each  state  transitions 
to  any  other  state  in  one  step  with  probability  greater  than  zero.  Alternatively,  the 
connectivity  of  the  state  space  may  be  restricted.  A  specific  instance  is  the  “left- 
right”  model,  where  a  process  in  state  i  at  time  t  is  allowed  only  two  options,  either 
remain  in  state  i  at  time  t  + 1  or  transition  to  state  i  + 1  at  time  /  ~  1 .  Thus  the  state 
transition  matrix  A  has  entries  on  the  main  and  first  diagonal  with  zeros  elsewhere: 


A  = 


.3  .7  0  0  0 

0  .2  .8  0  0 

0  0  .4  .6  0 

0  0  0  .5  .5 

.4  0  0  0  .6 


Another  topology  decision  is  modeling  of  the  observation  space.  A  discrete 
HMM  employs  a  discrete  observation  space,  called  an  alphabet,  which  consists  of 
Q  symbols.  A  discrete  observation  probability  matrix  defines  the  probability  of 
producing  a  symbol  given  the  state  of  the  model.  The  matrix  B  has  dimension 
S  x  Q,  and  the  number  of  parameters  in  the  model  grows  linearly  with  Q. 

A  discrete  HMM  may  be  used  to  model  continuous  observation  data,  but  the 
data  must  be  quantized  using  some  method,  typically  a  k-means  clustering,  into 


77 


a  discrete  alphabet.  Information  is  lost  during  the  quantizing  process,  but  model 
performance  may  not  decrease  substantially. 

A  Gaussian  HMM  assumes  that  the  observation  space  is  normally  distributed, 
and  B  is  no  longer  an  observation  distribution  matrix  in  the  sense  of  the  discrete 
HMM.  Instead,  B  contains  the  parameter  pair  /i  and  a2  for  the  Gaussian  associated 
with  each  hidden  state. 

The  feature  sets  considered  in  the  following  model  development  are  multi¬ 
dimensional.  For  instance,  an  observation  vector  in  the  maximum  value  feature 
set  x(max)  has  dimension  7.  The  assumed  Gaussian  observation  space  is  multi¬ 
dimensional,  and  a  decision  must  be  made  to  model  the  observations  using  seven 
1-D  HMMs,  or  one  7-D  HMM,  or  some  combination  of  lower-dimensioned  HMMs. 

The  initial  state  distribution  allows  control  of  the  initial  state  of  the  Markov 
chain.  By  setting  n  to  a  uniform  distribution  over  all  states,  no  prior  information  is 
inserted  about  the  initial  state. 

Suppose,  that  a  relationship  must  be  forced  between  the  hidden  states  and 
the  observation  sequence,  e.g.,  for  the  aspect  angle  of  the  ground  target  in  the  SAR- 
derived  observation  data.  As  described  above,  the  observations  are  ordered  by  aspect 
angle  beginning  at  1  degree  and  ending  with  360  degrees.  If  the  observations  are 
assumed  to  be  a  function  of  the  aspect  angle  of  the  target,  i.e.,  when  viewed  from 
a  certain  aspect  window,  the  observations  are  from  a  specific  state  in  the  hidden 
process.  Thus,  given  an  observation  sequence  that  begins  at  a  target  aspect  angle  of 
1  degree,  the  model  can  be  forced  to  start  in  state  1  by  setting  the  first  element  of 
7T  to  1  with  zeros  elsewhere. 

3.4-3  Fusion  Approaches 

In  a  basic  HMM-based  classifier,  one  model  is  trained  for  each  class  of  data. 
A  test  record  of  unknown  class  is  evaluated  by  each  class-specific  HMM,  producing 
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a  log- likelihood,  and  class  membership  is  assigned  according  to  the  greatest  log- 
likelihood.  If  the  test  record  is  of  dimension  n  and  the  classification  system  uses 
1-D  HMMs,  then  for  each  class  of  data  n  HMMs  must  be  trained.  For  a  three  class 
problem  such  as  that  described  above  using  1-D  HMMs  operating  on  the  maximum 
value  feature  set  a;(max),  the  MCS  consists  of  3  x  7  =  21  HMMs.  The  output  from 
this  bank  of  models  must  be  fused  to  assign  a  final  class  label. 

Figure  30  schematically  describes  majority  vote  and  mean  log-likelihood  fusion 
schemes.  The  MCS  considers  1-D  HMMs  operating  on  two  feature  sets,  where  feature 
sets  can  originate  from  separate  sensors.  As  noted  above,  when  using  1-D  HMMs, 
a  model  must  be  trained  for  every  class  of  data  and  for  every  dimension  of  the 
observation  feature  vector,  where  1-D  observation  sequences  (step  1)  are  evaluated 
by  the  bank  of  trained  HMMs  (step  2)  producing  log- likelihoods  (step  3). 

Two  methods  are  used  to  fuse  within  the  feature  set  (i.e.,  produce  a  single 
sensor  class  label).  A  majority  vote  scheme  tallies  the  winning  votes  for  each  class- 
specific  model  across  the  n  dimensions  of  the  feature  set.  Class  membership  is 
assigned  to  the  class  with  the  greatest  number  of  votes.  A  mean  log- likelihood  scheme 
computes  the  mean  output  for  each  class  across  the  n  dimensions  of  the  feature  set. 
Class  membership  is  assigned  to  the  class  with  the  largest  mean  log-likelihood.  The 
process  is  repeated  for  the  second  feature  set. 

The  same  schemes  can  be  used  to  fuse  across  the  feature  sets.  If  feature  set  1 
has  dimension  7  and  feature  set  2  has  dimension  6,  then  the  voting  scheme  assigns 
class  membership  by  tallying  across  13  dimensions.  Likewise,  the  mean  log-likelihood 
scheme  incorporates  all  13  features  before  the  final  class  assignment. 
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Figure  30.  Fusion  of  multiple  one-dimensional  HMMs  using  a  voting  scheme  and 
mean  log- likelihoods. 

3.4- 4  Development  Results 

3.4- 4- 1  Discrete  HMM 

This  section  describes  a  discrete  HMM-based  MCS  used  to  classify  sequenced 
observations  derived  from  MSTAR  SAR  data.  The  experiment  described  here  is 
derived  from  Albrecht  and  Gustafson’s  conference  paper  [99].  The  data  consists  of 
SAR  chips  of  three  ground  targets  (T-72,  BTR-70,  and  BMP-2)  collected  from  an 
airborne  sensor.  Each  chip  is  a  2-D  signature  centered  on  a  single  target.  Targets 
are  not  occluded  and  ground  clutter  is  minimal  (light  vegetation). 

Target  SAR  chips  are  processed  into  HRR  signatures,  and  the  signatures  are 
then  ordered  by  the  relative  sensor-target  aspect  angle.  Features  are  extracted  from 
the  HRR  signatures.  Sequences  of  features,  ordered  by  increasing  apsect  angle,  form 
the  observations  used  to  train  the  discrete  HMM-based  MCS.  Training  data  are 
collected  at  a  depression  angle  of  17°  and  testing  data  are  collected  at  15°. 

Two  sets  of  features,  representing  information  from  two  sensors  to  be  fused 
later,  are  extracted  from  target  HRR  signatures.  The  first  feature  set,  called  “bin 
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feature  set”  is  the  maximum  value  within  the  7  HRR  range  windows  of  Section  3.4.1, 
a;  (max)  The  second  feature  set  is  the  discrete  Fourier  transform  of  the  HRR  signa¬ 
ture,  called  “FFT  feature  set,” 

The  observation  data  are  quantized  in  order  to  apply  discrete  HMMs.  A  linear 
quantization  method  is  used  to  transform  the  continuous  data  into  an  alphabet  with 
Q  symbols.  This  method  maps  feature  data  into  Q  uniformly  spaced  intervals  based 
on  the  minimum  and  maximum  values  of  the  training  feature  data.  The  state  space 
of  the  discrete  HMMs  is  fully-connected  and  consists  of  S  states. 

The  experiment  explores  classification  performance  by  varying  the  number  of 
states  in  the  discrete  HMMs,  i.e. ,  S  =  2,  3,  . . . ,  20,  the  length  of  the  observation 
sequence,  and  the  method  of  fusing  the  component  classifier  outputs. 

Figure  31  presents  classification  performance  of  the  discrete  HMM-based  MCS 
using  only  the  bin  feature  set.  Seven  HMMs  per  target  type  are  employed  in  the 
MCS.  The  results  shown  in  Fig.  31  reflect  performance  using  HMMs  with  discrete 
observation  alphabets  of  10  symbols  (i.e.,  Q  =  10).  Not  shown  are  results  at  Q  =  30, 
where  performance  dropped  considerably  below  that  seen  with  Q  =  10. 

Several  trends  are  evident  in  the  subplots  of  Fig.  31.  First,  as  the  observation 
sequence  length  increases  classification  performance  increases.  This  result  follows  in¬ 
tuitively  as  the  classifier  is  presented  more  information  with  a  longer  sequence  length. 
Second,  a  general  downward  trend  in  performance  coincides  with  increasing  model 
complexity.  Third,  fusing  the  seven  HMM  outputs  using  the  mean  log- likelihood 
rule  results  in  significantly  improved  performance  over  the  majority  vote  rule  at 
lower  model  complexity  settings.  Finally,  the  fusion  rules  perform  better  than  their 
components  acting  independently. 

Figure  32  presents  classification  performance  of  the  discrete  HMM-based  MCS 
using  only  the  FFT  feature  set.  Six  HMMs  per  target  type  are  employed  in  the 
MCS.  The  results  shown  in  Fig.  32  reflect  performance  using  HMMs  with  discrete 
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Figure  31.  Fusion  of  multiple  one-dimensional  HMMs  using  a  voting  scheme  and 
mean  log- likelihoods  operating  on  bin  features. 


observation  alphabets  of  10  symbols  (i.e.,  Q  =  10).  Not  shown  are  results  at  Q  =  30, 
where  performance  dropped  considerably  below  that  seen  with  Q  =  10. 

Several  trends  are  evident  in  the  subplots  of  Fig.  32.  First,  compared  to  the 
bin  feature  set  of  Fig.  31,  the  FFT  feature  set  performs  poorly.  There  is  insignificant 
improvement  as  sequence  length  increase,  and  classifier  performance  is  relatively 
independent  of  model  complexity  However,  the  fusion  rules  perform  better  than 
their  components  acting  independently. 

Figure  33  presents  classification  performance  of  the  discrete  HMM-based  MCS 
using  both  feature  sets.  Thirteen  HMMs  per  target  type  are  employed  in  the  MCS. 
The  results  shown  in  Fig.  33  reflect  performance  using  HMMs  with  discrete  observa¬ 
tion  alphabets  of  10  symbols  (i.e.,  Q  =  10).  Not  shown  are  results  at  Q  =  30,  where 
performance  dropped  considerably  below  that  seen  with  Q  =  10. 
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Figure  32.  Fusion  of  multiple  one-dimensional  HMMs  using  a  voting  scheme  and 
mean  log-likelihoods  operating  on  FFT  features. 


Fusing  both  feature  sets  improves  performance  over  using  either  feature  set 
alone.  The  trends  in  Fig.  31  can  be  seen  in  the  fused  performance.  As  the  observation 
sequence  length  increases  classification  performance  increases.  A  general  downward 
trend  in  performance  coincides  with  increasing  model  complexity.  Fusing  outputs 
with  the  mean  log-likelihood  rule  results  in  significantly  better  performance  than  the 
majority  vote  rule  at  lower  model  complexity  settings. 

A  random  target  classifier  operating  on  3  targets  has  a  probability  of  correct 
selection  (PCS)  of  33%.  The  HMM-based  classifier  used  in  this  experiment  peaked 
at  94%  PCS  and  typically  operated  between  70  and  80%  at  low  model  complexity 
and  small  alphabet  size  settings. 
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Figure  33.  Fusion  of  multiple  one-dimensional  HMMs  using  a  voting  scheme  and 
mean  log- likelihoods  across  two  feature  sets. 


3. 4 -4- 2  Gaussian  HMM 


This  section  describes  a  Gaussian  HMM-based  MCS  used  to  classify  sequenced 
observations  derived  from  MSTAR  SAR  data.  The  experiment  described  here  ex¬ 
tends  the  discrete  research  of  the  previous  section  [99]  and  is  derived  from  Albrecht 
and  Bauer’s  conference  paper  [100] .  As  in  the  discrete  case,  the  data  consist  of  SAR 
chips  of  three  ground  targets  (T-72,  BTR-70,  and  BMP-2)  collected  from  an  airborne 
sensor.  Each  chip  is  a  2-D  signature  centered  on  a  single  target.  Targets  are  not 
occluded  and  ground  clutter  is  minimal  (light  vegetation). 

Two  sets  of  features,  representing  information  from  two  sensors  to  be  fused 
later,  are  extracted  from  target  HRR  signatures.  The  first  feature  set,  called  “bin 
feature  set”  is  the  maximum  value  within  7  HRR  range  windows  (see  Section  3.4.1) 
x(max)  The  second  feature  set  is  the  discrete  Fourier  transform  of  the  HRR  signa¬ 
ture,  called  “FFT  feature  set,” 
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Figure  34.  Observations  in  the  feature  space  are  linked  to  the  observation  distri¬ 


butions  of  the  hidden  states. 


The  state  structure  employed  in  the  experiment  described  is  a  left-right  model. 
The  state  transition  matrix  explicitly  restricts  transitions  to  two  possibilities.  First, 
the  process  remains  in  the  same  state  (i.e.,  jumps  to  itself  at  the  next  time  step). 
Second,  the  process  transitions  from  a  state  to  its  adjacent  neighbor.  Using  a  left- 
right  model  may  link  observations  to  the  ordered  transition  between  states  as  shown 
in  Fig.  34. 

In  a  Gaussian  HMM,  observations  from  a  given  hidden  state  are  distributed 
normally  with  parameters  /i  and  a2,  the  mean  and  variance.  The  parameter  re¬ 
estimation  algorithm  used  to  train  the  Gaussian  HMM  requires  an  initial  parameter 
pair  for  each  state  distribution.  Using  the  left-right  model  paradigm,  each  state  is 
initially  assumed  to  cover  observations  within  a  certain  aspect  window.  For  example, 
each  state  in  a  model  containing  30  hidden  states  in  the  left-right  state  space  initially 
covers  a  360/  30  =  12  degree  aspect  window.  The  sample  mean  and  variance  of 
observations  in  the  training  data  corresponding  to  the  aspect  window  are  used  to 
initialize  the  Gaussian  HMM  state  observation  distribution  parameters. 

The  experiment  explores  classification  performance  by  varying  the  number  of 
states  in  the  Gaussian  HMMs,  S  =  10,  20,  30,  40,  60,  72,  and  90,  the  length  of  the 
observation  sequence,  and  the  method  of  fusing  the  component  classifier  outputs. 
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Figure  35  presents  classification  performance  of  the  Gaussian  HMM-based 
MCS.  Because  of  the  relationship  between  the  hidden  states  and  the  target  aspect 
angle,  prior  knowledge  of  the  target  pose  can  be  used.  In  Fig.  35  no  prior  knowledge 
is  used  for  classification. 

Several  trends  are  evident  in  the  subplots  of  Fig.  35.  First,  as  the  observation 
sequence  length  increases  classification  performance  increases.  This  result  follows 
intuitively  as  the  classifier  is  presented  more  information  with  a  longer  sequence 
length.  Second,  increased  model  complexity  yields  negligible  performance  improve¬ 
ment.  Third,  fusing  the  seven  HMM  outputs  using  the  mean  log-likelihood  rule 
results  in  better  performance  than  the  majority  vote  rule  across  model  complexity 
settings. 

Figure  36  presents  classification  performance  of  the  Gaussian  HMM-based  MCS 
with  prior  knowledge  of  target  pose.  The  pose  information  is  incorporated  into  the 
model  by  specifying  the  initial  state  when  testing  an  observation  sequence.  Given  a 
test  sequence  which  begins  with  an  observation  of  the  target  at  65°  relative  aspect 
angle  and  a  Gaussian  HMM  with  30  hidden  states  (each  state  initialized  to  cover  a 
12°  window),  the  initial  state  probability  vector  it  is  set  to  zeros  everywhere  except 
element  5,  which  is  set  to  1.  Thus,  the  test  sequence  is  evaluated  with  the  hidden 
process  beginning  in  state  5. 

Incorporating  prior  target  pose  information  significantly  improves  classifica¬ 
tion  performance  as  seen  in  Fig.  36.  Prior  aspect  knowledge  also  removes  relative 
performance  benefits  between  the  two  fusion  methodologies. 

3-4- 4-3  Multi- dimensional  Gaussian  HMM 

This  section  describes  a  multi-dimensional  Gaussian  HMM-based  MCS  used 
to  classify  sequenced  observations  derived  from  MSTAR  SAR  data.  The  experiment 
described  here  extends  the  one-dimensional  Gaussian  HMM  research  of  the  previous 
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Figure  35.  Fusion  of  multiple  one-dimensional  Gaussian  HMMs  using  a  voting 
scheme  and  mean  log-likelihoods.  No  prior  knowledge  of  target  aspect 
angle  is  used. 


section.  As  in  the  previous  cases,  the  data  consist  of  SAR  chips  of  three  ground 
targets  (T-72,  BTR-70,  and  BMP-2)  collected  from  an  airborne  sensor.  Each  chip 
is  a  2-D  signature  centered  on  a  single  target.  Targets  are  not  occluded  and  ground 
clutter  is  minimal  (light  vegetation). 

Two  sets  of  features,  representing  information  from  two  sensors  to  be  fused 
later,  are  extracted  from  target  HRR  signatures.  The  first  feature  set,  called  “bin 
feature  set”  is  the  maximum  value  within  7  HRR  range  windows  (see  Section  3.4.1) 
x(max)  The  second  feature  set  is  the  discrete  Fourier  transform  of  the  HRR  signa¬ 
ture,  called  “FFT  feature  set,”  The  state  structure  employed  is  a  left-right 

model  as  in  the  previous  section. 

In  a  multi-dimensional  Gaussian  HMM,  observations  from  a  given  hidden  state 
are  distributed  multi- variate  normally  with  parameters  /i  and  E,  the  mean  vector 
and  covariance  matrix.  The  parameter  re-estimation  algorithm  used  to  train  the 
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Figure  36.  Fusion  of  multiple  one-dimensional  Gaussian  HMMs  using  a  voting 
scheme  and  mean  log-likelihoods.  Prior  knowledge  of  target  aspect 
angle  as  a  function  of  the  number  of  hidden  states  is  used. 


Gaussian  HMM  requires  an  initial  parameter  pair  for  each  state  distribution.  Using 
the  left-right  model  paradigm,  each  state  is  initially  assumed  to  cover  observations 
within  a  certain  aspect  window.  For  example,  each  state  in  a  model  containing  30 
hidden  states  in  the  left-right  state  space  initially  covers  a  360/  30  =  12  degree  aspect 
window.  The  sample  mean  and  covariance  matrix  of  observations  in  the  training  data 
corresponding  to  the  aspect  window  are  used  to  initialize  the  Gaussian  HMM  state 
observation  distribution  parameters. 

As  seen  in  Fig.  37,  the  observation  space  is  assumed  to  be  multi- variate  normal 
with  dimension  7,  which  covers  the  feature  space  of  the  first  feature  set,  :j;(inaxj .  A 
second  multi-dimensional  Gaussian  HMM  is  used  to  model  the  second  feature  set, 

with  dimension  6.  Thus,  for  each  target  only  two  HMMs  are  needed  versus  13 
models  for  the  case  of  Sec.  3. 4. 4. 2. 


Figure  37.  Multi-dimensional  observations  in  the  feature  space  are  linked  to  the 
observation  distributions  of  the  hidden  states. 


With  only  two  model  outputs  to  combine,  the  majority  vote  fusion  method  is 
not  used.  Instead,  only  the  mean  log-likelihood  method  is  used.  The  experiment 
explores  classification  performance  by  varying  the  number  of  states  in  the  Gaussian 
HMMs,  S  =  10,  20,  30,  40,  60,  72,  and  90,  and  the  length  of  the  observation  sequence. 

Figure  38  presents  classification  performance  of  the  multi-dimensional  Gaus¬ 
sian  HMM-based  MCS  with  no  prior  target  aspect  information.  Several  trends  are 
evident  in  the  subplots  of  Fig.  38.  First,  as  the  observation  sequence  length  increases 
classification  performance  increases.  This  follows  intuitively  as  the  classifier  is  pre¬ 
sented  more  information  with  a  longer  sequence  length.  Second,  model  performance 
decreases  with  increased  model  complexity. 

Figure  39  presents  classification  performance  of  the  multi-dimensional  Gaussian 
HMM-based  MCS  with  prior  knowledge  of  the  target  pose.  The  pose  information  is 
incorporated  into  the  model  as  described  in  Sec.  3. 4. 4. 2.  Incorporating  prior  target 
pose  information  improves  classification  performance.  At  the  longest  sequence  length 
setting  near  perfect  classification  is  achieved. 

Figure  40  shows  Akaike’s  information  criterion  (AIC)  for  the  case  of  multi¬ 
dimensional  Gaussian  HMMs.  Each  dashed  line  represents  AIC  versus  number  of 
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Figure  38.  Fusion  of  multi-dimensional  Gaussian  HMMs  using  mean  log- 
likelihoods.  No  prior  knowledge  of  target  aspect  angle  is  used. 
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Figure  39.  Fusion  of  multi-dimensional  Gaussian  HMMs  using  mean  log- 
likelihoods.  Prior  knowledge  of  target  aspect  angle  as  a  function  of 
the  number  of  hidden  states  is  used. 
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Figure  40.  Akaike’s  information  criterion  versus  number  of  hidden  states  for  the 
multi-dimensional  Gaussian  HMM.  Three  dashed  lines  represent  AIC 
for  target-specific  models.  The  solid  line  is  the  mean  AIC  across  target 
models. 

hidden  states  in  a  class  specific  model.  Since  the  target  set  has  three  members,  there 
are  three  AIC  lines.  The  solid  line  is  the  mean  AIC  across  all  three  target  types. 

As  the  number  of  hidden  states  increases,  the  amount  of  information  lost  (AIC) 
decreases,  reaching  a  minimum  at  72  hidden  states.  The  AIC  at  60  and  90  hidden 
states  is  approximately  equal  to  that  of  72  hidden  states. 

The  AIC  suggests  that  an  appropriate  level  of  model  complexity  given  the 
training  data  used  in  the  multi-dimensional  Gaussian  HMM  experiment  is  either  60, 
72,  or  90  hidden  states. 

3. 5  Summary 

This  chapter  provides  an  introductory  example  of  a  discrete  HMM  applied  in  a 
genetic  sequence  classification  experiment.  In  addition,  it  applies  model  complexity 
theory  to  the  study  of  HMMs.  Sections  3.3.1  and  3.3.2  apply  AIC  and  BIC  informa¬ 
tion  theoretic  measures  to  discrete  and  continuous  HMMs  in  a  controlled  experiment 


91 


to  identify  appropriate  model  complexity.  In  these  experiments,  data  are  generated 
using  Markov  chains  with  3  states.  HMMs  of  varying  complexity  are  trained  and 
tested,  and  resulting  AIC  and  BIC  measures  are  calculated.  In  each  case  AIC  and 
BIC  concur  that  an  HMM  of  order  3  is  the  best-suited  model  given  the  data. 

Sections  3. 4. 4.1,  3.4.4. 2,  and  3. 4. 4. 3  detail  the  development  of  discrete,  1- 
dimensional,  and  multi-dimensional  Gaussian  HMMs  for  a  ATR  classifier  using  se¬ 
quenced  SAR  data  as  input.  Performance  measures  and  model  topology  suggest 
that  a  multi- dimensional  Gaussian  HMM  with  60,  72,  or  90  hidden  states  is  most 
appropriate  given  SAR-based  feature  data  as  in  Sec.  3.4.1. 
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4.  CID  Optimization  Formulation 

This  chapter  presents  a  combat  identification  (CID)  optimization  formulation  that 
extends  the  CID  framework  proposed  by  Laine  [15].  It  begins  by  defining  CID 
and  automatic  target  recognition  (ATR)  related  terms.  Next,  CID  analysis  using 
receiver  operating  characteristic  (ROC)  curves  and  confusion  matrices  is  covered. 
Finally,  Laine’s  framework  is  covered,  and  the  extension  to  include  out-of-library 
methodology  is  presented. 

4-1  Definitions 

The  following  terms  related  to  research  in  the  area  of  CID  and  ATR  are  defined 
prior  to  the  presentation  of  the  proposed  extended  framework. 

ATD/R  Automatic  target  detection  and  recognition  refers  to  the  process 
of  detecting  a  region  of  interest  (ROI)  where  a  target  may  reside. 

The  assumption  in  this  research  is  that  target  detection  has  been 
accomplished,  an  ROI  is  established,  and  target  recognition  is  the 
primary  function. 

ATR  Automatic  target  recognition  refers  to  the  process  of  classifying  ob¬ 
jects  in  the  ROI.  In  this  research,  ATR  is  performed  with  no  human 
in-the-loop.  Figure  41  shows  a  notional  ATR  system  which  incorpo¬ 
rates  two  sensors  and  a  fusion  rule  to  combine  sensor  output  prior  to 
labeling  the  target. 

CID  Combat  identification  is  the  process  of  obtaining  accurate  charac¬ 
terizations  of  detected  objects  in  the  joint  battlespace  to  the  extent 
that  high  confidence  and  timely  application  of  military  options  and 
weapons  resources  can  occur  [3].  An  ATR  system  may  be  part  of  a 
CID  system.  A  pilot’s  eyes  may  be  part  of  a  CID  system. 
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Figure  41.  Notional  ATR  system  with  two  sensors  evaluating  observations  through 
time  t  =  T. 

Clutter  Clutter  encompasses  the  set  of  naturally  occurring  objects  that 
degrade  sensor  performance.  Examples  of  clutter  include  trees,  rocks, 
and  vegetation. 

Confuser  Confusers  are  man-made  objects  which  a  sensor  may  confuse 
with  a  true,  in-library  target.  A  decoy  is  an  example  of  a  confuser. 

EOC  Extended  operating  conditions  are  those  variations  of  target  pre¬ 
sentation  and  environment  which  alter  the  sensed  target  signature 
from  that  of  the  training  signature.  Examples  of  EOCs  include  turret 
and  barrel  position  of  a  tank,  dense  foliage  versus  sparse  foliage,  and 
depression  angle  from  airborne  sensor  to  ground  target. 

In-library  refers  to  target  types  present  in  the  classifier  training  set. 

Label  is  the  output  of  a  classifier,  or  multi-classifier  ATR  system,  when 
presented  an  ROI.  Labels  include  “hostile,”  “friend,”  and  “non-declare.” 

Out-of-library  refers  to  target  types  not  present  in  the  classifier  training 
set.  The  classifier  has  not  been  trained  to  recognize  these  targets. 

ROI  An  ATR  system  is  cued  to  a  region  of  interest  in  order  to  classify 
the  target  residing  in  the  ROI. 
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Target  class  refers  to  a  grouping  of  similar  target  types.  The  grouping 
may  depend  on  target  intent  (hostile,  friendly,  neutral),  country  of 
origin  (U.S.,  NATO,  or  China)  or  vehicle  type  (tank,  missile  launcher, 
or  truck). 

Target  type  refers  to  classification  based  on  high-fidelity  physical  prop¬ 
erties  of  the  target.  Variants  of  the  T-72  all  fit  within  the  T-72  target 
type.  Another  main  battle  tank,  the  M-1A1  and  its  variants  form 
another  target  type. 


It- 2  ROC  and  confusion  matrix  analysis 


Traditional  ATR  performance  analysis  uses  ROC  curves  and  confusion  matrices 
to  estimate  system  performance  [97].  ROC  curves  relate  classification  performance 
by  moving  a  threshold  from  conservative  to  aggressive  settings.  Typically,  a  ROC 
curve  shows  the  trade-off  between  true-positive  and  false-positive  performance  as  a 
function  of  a  moving  ROC  threshold,  9  €  [0, 1],  from  0  to  1. 

At  each  threshold  setting  true-positive  and  false-positive  calculations  are  made 
based  on  class  posterior  probabilities  output  by  the  classifier.  Given  a  threshold  6 
and  two-class  posterior  probabilities,  ppT  (target)  and  ppF  (friend),  the  classifier 
labels  test  records  according  to: 


label  = 


“target”  if  ppT  >  0 
“friend”  \ippT  <  9 


(48) 


By  comparing  true  class  with  classifier-assigned  labels,  true-positive  and  false-positive 
metrics  are  derived.  Plotting  the  true-positive  and  false-positive  pairings  for  each 
threshold  setting  produces  a  ROC  curve. 

Laine  [15]  introduces  the  idea  of  a  ROC  surface  with  the  addition  of  a  rejection 
option.  A  third  performance  measure,  probability  of  declaration  P^ec,  is  added  to 
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the  two  already  in  use,  probability  of  true-positive,  Ptp,  and  probability  of  false¬ 
positive,  Pfp.  Here  P(\ec  captures  the  number  of  records  labeled  “no  declaration.” 
The  three  measures  are  estimated  as  a  function  of  the  threshold  0  such  that 

Ap  =  Ap(0)  ,  Pfp  =  PfpW)  ,  and  Alec  =  Aiec(d)  =  1  -  Aej(d),  (49) 

where  Prej  is  the  estimated  probability  of  labeling  “non-declaration.” 

The  ROC  surface  s  is  produced  by  varying  0  over  its  range  O: 

*  =  *(»)  =  { (A p(f>) .  ApW .  PdecW)  0  €  »} .  (50) 

where  the  threshold  6  now  defines  a  rejection  region.  The  center  of  the  rejection 
region  is  defined  by  $roc;  and  the  rejection  region  half- width  is  given  by  $rej-  Thus, 
the  bounds  on  the  rejection  region  are  (6*roc  —  $rej  •  ^roc  +  $rej). 

Labeling  with  a  rejection  option  follows: 


“target” 

for  ppT  >  $roc  +  $rej 

label  =  “friend” 

for  ppT  <  $roc  _  $rej  • 

(51) 

“non-declare” 

for  #roc  —  $rej  A  ppT  <  $roc  +  $rej 

Figure  42  depicts  the  labeling  process  given  a  rejection  region.  Two  distributions 
of  classifier-produced  posterior  probabilities  are  given.  Records  of  true  target  class 
have  higher  posterior  probabilities  while  true  friend  class  have  lower  posterior  proba¬ 
bilities.  The  two  distributions  overlap,  creating  classification  errors  given  a  decision 
boundary.  By  inserting  a  rejection  region,  the  classifier  declares  only  those  records 
with  high  likelihood  of  class  membership  [89].  Classification  errors  are  reduced  at 
the  expense  of  fewer  declarations. 

A  ROC  surface  plots  ROC  curves  across  a  third  dimension  which  measures 
classifier  declaration  performance.  Declaration  performance  is  a  function  of  the  width 
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Figure  42.  Rejection  region  based  on  ROC  and  rejection  thresholds  applied  to  class 
posterior  probability. 

of  the  rejection  region;  the  wider  the  region,  the  more  “non-declaration”  labels, 
resulting  in  a  lower  declaration  rate.  By  varying  the  $roc  and  #rej  from  conservative 
to  aggressive  settings,  a  ROC  surface  is  produced.  Figure  43  provides  an  example 
ROC  surface.  The  plot  shows  decreased  performance  as  declaration  rate  increases. 

Analysis  of  CID  system  performance  with  confusion  matrices  yields  a  table 
of  classifier  labeling  versus  truth  given  a  set  of  test  data  and  thresholds  ($roc  and 
#rej)-  Figure  44  is  a  confusion  matrix  for  a  system  with  four  classifier  labels  (“en¬ 
emy,”  “friend,”  “neutral,”  and  “non-declaration”)  and  three  true  target  classes.  Each 
matrix  entry  represents  test  record  labeling  conditioned  on  true  target  class.  For 
example,  the  first  row  shows  the  number  of  true-enemy  records  labeled  “Enemy,” 
“Friend,”  “Neutral,”  and  “Non-declare,”  respectively.  Reading  horizontally  indi¬ 
cates  how  well  the  classifier  identifies  true-enemy  records.  A  common  horizontal 
metric  is  the  probability  of  true-positive,  proportioned  to  the  number  of  true-enemy 
records  labeled  “Enemy”  given  that  a  declaration  is  made. 
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Figure  43.  Family  of  ROC  curves  measuring  true-positive  and  false-positive  per¬ 
formance  versus  percentage  of  records  declared.  Points  are  experiment 
measurements. 
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Figure  44.  Confusion  matrix  with  FEN  classes  and  non-declaration  option. 
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The  columns  of  the  confusion  matrix  indicate  the  true  class  of  the  records 
given  a  specific  label.  For  example,  the  second  column  shows  the  respective  number 
of  true-enemy,  friend,  and  neutral  class  records  labeled  “Friend”  by  the  classifier. 
Reading  vertically  indicates  classifier  accuracy  when  applying  the  “Friend”  label. 
One  vertical  metric  is  the  fratricide  rate;  the  number  of  “Enemy”  labels  applied  to 
true-friend  records  given  that  a  declaration  is  made. 

Figure  44  uses  hatching  to  distinguish  between  performance  measures.  The 
entries  on  the  main  diagonal  reflect  correct  labeling  of  each  true  class  of  target. 
Critical  errors  includes  both  mislabeling  a  true-enemy  as  “Friend”  or  “Neutral,” 
and  applying  an  “Enemy”  label  to  a  true-friend  or  neutral.  Lesser  errors  have  little 
impact  on  warfighter  decisions  and  include  the  cross-labeling  of  friend  and  neutral 
records.  If  friends  and  neutrals  are  treated  in  the  same  fashion,  then  cross-labeling 
errors  are  inconsequential. 

Figure  45  shows  a  confusion  matrix  with  two  types  of  hostile  targets  and  in¬ 
troduces  non-critical  errors  as  another  performance  measure.  The  number  of  true 
target  classes  remains  the  same  by  merging  the  friend  and  neutral  classes  due  to  the 
low  impact  of  cross-labeling  error.  The  enemy  class  is  sub-divided  into  target-of-the- 
day  (TOD)  and  other-hostile  (OH)  classes.  This  subdivision  facilitates  analysis  of 
non-critical  errors,  which  occur  when  incorrect  hostile  targets  are  engaged  or  when 
a  weapons-target  mismatch  occurs.  In  either  case,  a  suboptimal  employment  of 
resources  occurs  without  loss  of  friendly/neutral  life. 

Figure  46  expands  the  number  of  true  target  classes  to  four  with  inclusion 
of  an  out-of-library  class.  Test  records  belonging  to  the  out-of-library  class  are  of 
target  types  not  included  in  the  classifier  training  set.  Critical  errors  in  Fig.  45  are 
similarly  defined  for  Fig.  46.  Non-critical  errors  expand  to  include  mislabeling  of 
true  out-of-library  targets  as  “Target-of-the-day”  or  “Other  hostile,”  and  labeling  as 
“Out-of- library”  those  target  of  true-target-of-the-day  or  other- hostile  class. 
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Figure  45.  Confusion  matrix  with  multiple  hostile  classes  and  non-declaration  op¬ 
tion. 
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Figure  46.  Confusion  matrix  with  multiple  hostile  classes,  out-of-library  records, 
and  non-declaration  option. 
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4-3  Extended  mathematical  programming  CID  optimization  formula¬ 
tion 

Laine’s  optimization  framework  [15]  uses  a  mathematical  programming  (MP) 
formulation  to  optimize  CID  systems  without  reference  to  fixed  error  costs.  Non¬ 
linear  optimization  of  the  decision  space  across  classifier  label  mappings  as  a  func¬ 
tion  of  variable  threshold  settings  provides  a  flexible  objective  function/constraint 
set  pairing  to  suit  warfighter  preferences.  For  example,  one  strategy  might  be  to 
maximize  classifier  true-positives  while  constraining  the  system  to  a  maximum  error 
rate  and  a  minimum  declaration  rate. 

Table  8  gives  an  initial  MP  CID  formulation  which  includes  an  out-of-library 
performance  measure.  The  initial  formulation  seeks  to  maximize  the  true-positive 
rate  of  the  CID  system  subject  to  several  constraints.  The  true-positive  rate  (TPR) 
is  a  measure  of  true-positives  as  a  function  of  time  or  number  of  observations.  The 
motivation  is  to  capture  the  benefit  of  additional  observations  when  classifying  time- 
series  data.  Intuitively,  a  classifier  performs  better  when  given  more  (discriminatory) 
information,  and  TPR  seeks  to  quantify  the  performance  benefit  of  additional  ob¬ 
servations. 

4-3.1  Decision  variables 

Decision  variables  used  in  the  MP  framework  are  organized  into  three  groups: 
choice  of  fusion  rule,  choice  of  sensors,  and  choice  of  thresholds  associated  with  the 
sensors  and  fusion  rule. 

Using  Laine’s  notation  [15],  F%  is  an  indicator  variable  that  describes  use  of 
the  zth  fusion  rule.  Since  the  CID  system  under  investigation  fuses  multiple  sensors 
with  one  fusion  rule,  only  one  of  /  fusion  rules  may  be  selected  in  the  optimal 
arrangement.  Thus,  F)  =  1  if  the  fusion  rule  is  chosen  and  all  other  entries  are 
set  to  zero. 
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Table  8.  Initial  MP  Formulation  of  CID  Optimization  Framework 


Objective  function 

MP  formulation 

Impact 

Maximize  true  posi¬ 
tive  rate 

max  TPR(a;) 

maximize  number  of  true 
positives  per  look 

Constraints 

Critical  error 

Eciit  <  0.02 

upper  bound  on  critical  er¬ 
ror  performance  measure 

Non-critical  error 

^ncrit  ^  0.05 

upper  bound  on  non-critical 
error  performance  measure 

True  positive 

Pip  >  0.9 

lower  bound  on  true  positive 
performance  measure 

Declarations 

^dec  >  0.7 

lower  bound  on  declaration 
performance  measure 

Out-of-library 

Pool  >  0.6 

lower  bound  on  out- 
of-library  performance 

measure 
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The  fusion  rule  employs  the  output  from  an  ensemble  of  sensors,  where  Sj  is 
an  indicator  variable  taking  a  value  of  1  if  sensor  S3  is  employed  in  the  fusion  scheme 
and  0  if  not.  Constraints  may  be  employed  to  limit  the  selection  to  a  certain  number 
of  sensors.  For  instance,  the  fused  MCS  must  use  at  least  one  but  not  more  than 
three  sensors. 

The  third  group  of  decision  variables  are  the  thresholds  related  to  the  fusion 
rule  and  component  sensors.  Using  the  index  i  to  refer  to  the  fusion  method  and  j  to 
refer  to  the  component  sensor,  6l]  is  the  threshold  related  to  a  specific  CID  system 
decision  using  fusion  rule  i  and  sensor  j.  Example  thresholds  are  the  ROC  threshold 
(9roc  and  rejection  threshold  d^EJ,  which  together  define  the  rejection  region  at  the 
classifier  level  for  classifier  j  (i  =  0  indicates  that  the  threshold  is  not  used  at  the 
fusion  level). 

4-3.2  Performance  Measures 

Given  two  competing  CID  systems,  their  performance  is  compared  using  esti¬ 
mates  of  true  performance.  The  following  sections  develop  these  estimated  perfor¬ 
mance  measures  (foregoing  the  estimator  symbol  ~). 

4-3.2. 1  True-positive 

Horizontal  analysis  of  confusion  matrix  entries  produces  performance  estimates 
of  class  labels  given  true  class.  Some  flexibility  exists  in  using  true-positive  as  a 
performance  measure  because  the  user  identifies  which  class  is  the  sought  after  target 
class.  For  this  example,  target  classes  are  defined  in  Fig.  46  as  TOD,  OH,  FN,  and 
OOL  with  hostile  targets  TOD  and  OH  as  the  target  classes. 
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Thus  an  estimate  for  true-positive  performance  is  the  number  of  true-hostile 
records  labeled  “Hostile”  divided  by  the  total  number  of  true-hostile  records: 


Ptp  =  P(“TOD”  U  “OH”  | TOD  U  OH) 

num(“TOD”  |TOD  +  “TOD”  |OH  +  “OH”  |TOD  +  “OH”  |OH) 
num(TOD  eval  +  OH  eval) 

Further  refinement  of  the  performance  measure  may  restrict  the  calculation  to 
records  on  which  a  declaration  is  made  by  the  classifier.  This  refinement  accounts  for 
the  added  rejection  option  and  resultant  “non-declare”  label.  Here  the  calculation 
is 


Ptp  =  P(  “TOD”  U  “OH”  |  (TOD  U  OH)  0  declaration) 

num(  “TOD”  |  TOD  +  “TOD”  |  OH  +  “OH”  |  TOD  +  “OH”  |  OH) 
num(TOD  declared  +  OH  declared) 

4- 3. 2. 2  Critical  error 

Critical  error  calculation  reverses  the  order  of  conditioning  seen  in  the  true¬ 
positive  calculation;  instead  of  finding  the  probability  of  correct  label  given  a  true 
class,  critical  error  finds  the  probability  of  true  class  membership  given  a  label. 
Critical  error  calculation  involves  vertical  analysis  of  the  confusion  matrix. 

Using  Fig.  46,  critical  error  is 

(  (  P(“TOD”  n  FN)  U  P(“OH”  n  FN)  U  \  \ 

P(Pcrit)  =  P  I  I  I  |  declaration  I  .  (54) 

yy  P(“FN”  CTOD)  UP(“FN”  COH)  J  J 

Simplification  of  Eq.  54  makes  use  of  Bayes’  rule,  and  depends  on  class  preva¬ 
lence  as  defined  by  class  prior  probabilities.  Let  P(TOD),  P(OH),  P(FN),  and 
P(OOL)  be  the  prior  probabilities  of  the  four  true  target  classes,  and  let  P(“TOD”), 
P(“OH”),  P(“FN”),  P(“OOL”),  P( “Non-declare”)  be  the  unconditional  system  label 
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probabilities.  Then 


P(TOD)  +  P(OH)  +  P(FN)  +  P(OOL) 
P(“TOD”)  +  P(“  OH”)  +  P(“FN”)  +  P(“OOL”)  +  P(“Non-declare”) 


1  (55) 
1.  (56) 


The  simplified  P(Pcr it)  calculation  is  then 


-P(^crit) 


/  P(“TOD”|FN)P(FN)  +  P(“OH”|FN)P(FN)+ 
y  P(“FN”|TOD)P(TOD)  +  P(“FN”|OH)P(OH) 
1  —  P(“Non-declare”) 


(57) 


where  P(“Non-declare”)  is  determined  by  the  sum  of  the  class-specific  probability  of 
non-declaration: 


P(“Non-declare”|TOD)P(TOD)  +  P(“Non-declare”  |OH)P(OH) 

+  P(  “Non-declare”  |FN)P(FN)  (58) 

+  P(“Non-declare”  |OOL)P(OOL). 


4 .3. 2. 3  Non- critical  error 

As  with  critical  error,  non-critical  error  calculation  reverses  the  order  of  con¬ 
ditioning  in  the  true-positive  calculation.  Non-critical  error  calculation  involves  ver¬ 
tical  analysis  of  the  confusion  matrix. 

Some  flexibility  exists  in  choosing  which  classification  errors  constitute  non- 
critical  errors.  For  this  example,  non-critical  errors  consider  only  cross-labeled  hostile 
targets  (i.e.,  TOD  labeled  “OH”  and  OH  labeled  “TOD”),  which  is  a  reduced  non- 
critical  error  set  from  that  shown  in  Fig.  46.  Adjusting  the  non-critical  error  set 
requires  relatively  simple  modification  to  the  following  calculations: 
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P(E ncrit)  =  P((P(“TOD”|OH)  U  P(“OH”|TOD)  )  |  declaration) 


(59) 


which  simplifies  to  a  non-critical  error  calculation  that  incorporates  prior  class  prob¬ 
abilities,  i.e., 


^(^ncrit) 


P(  “TOD”  |OH)P(OH)  +  P(“OH”  |TOD)P(TOD) 
1  —  P(  “Non-declare”) 


(60) 


where  P(“Non-declare”)  is  determined  by  the  sum  of  the  class-specific  probability  of 
non-declaration  (see  Eq.  75). 


4-  3. 2. 4  Declaration 

The  declaration  performance  measure  captures  the  percentage  of  test  records 
which  the  CID  system  labels  with  one  of  the  true  class  labels.  The  complementary 
measure  is  the  non-declaration  performance  measure.  It  tabulates  the  number  of 
records  labeled  “Non-declare”  by  the  system: 


P(]ec  =  1  —  P(  “Non-declare”) 

/  P(  “Non-declare”  |TOD)P(TOD)  +  P(  “Non-declare”  |OH)P(OH)\ 
=  1-  +P(  “Non-declare”  |FN)P(FN) 

\  +P(  “Non-declare”  |OOL)P(OOL)/ 


(61) 


4-3. 2. 5  Out- of -library 

The  out-of-library  performance  measure  is  a  true-positive  labeling  of  “OOL” 
given  an  OOL  record  using  horizontal  analysis  of  confusion  matrix  entries.  For  this 
example,  target  classes  are  defined  as  in  Fig.  46  (TOD,  OH,  FN,  and  OOL). 
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Thus,  the  estimate  for  the  out-of-library  performance  measure  is  the  number  of 
true-OOL  records  labeled  “OOL”  divided  by  the  total  number  of  true-OOL  records 
evaluated: 


Pool  =  P(“OOL”  |OOL) 

num(  “OOL”  |  OOL) 
num(OOL  eval) 

Further  refinement  of  the  performance  measure  may  restrict  the  calculation  to 
records  on  which  a  declaration  is  made  by  the  classifier.  This  refinement  accounts  for 
the  added  rejection  option  and  resultant  “non-declare”  label.  Thus  the  calculation 
is 


Pool  =  P(  “OOL”  |  OOL) 

num(  “OOL”  |  OOL) 

—  /  \  ■ 

num(OOL  declared) 

The  out-of-library  labeling  methodology  is  separate  from  the  labeling  method¬ 
ology  for  in-library  classes.  Figure  47  shows  a  notional  implementation  of  the  out-of- 
library  labeling  methodology.  Focusing  on  a  single  sensor,  observations  of  a  region 
of  interest  are  made  through  time  and  passed  to  a  feature  processor  to  extract  fea¬ 
tures,  which  are  the  basis  for  classification.  The  classifier  produces  a  10-dimensional 
class  posterior  probability  vector.  Based  on  feature  observations,  this  vector  is  the 
classifier’s  best  guess  at  class  membership.  The  vector  is  10-dimensional  because  the 
classifier  has  been  trained  to  recognize  10  in-library  classes. 

The  proposed  in-library /out-of-library  discriminator  takes  the  10-dimensional 
class  posterior  probability  vector  as  input  and  produces  an  11-dimensional  class 
posterior  probability  vector.  The  11-D  vector  adds  a  posterior  probability  for  the 
out-of-library  (OOL)  class  as  a  function  of  the  10-D  in-class  vector. 
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In-library  classes 
TOD  FNi 
OH!  FN2 

oh2  fn3 
oh3  fn4 
oh4  fn5 


Features 


Classifier  output 


In-lib/Out-of-lib 
discriminator  output 


“Target” 

“Non-target” 

’  “Out-of-library” 
“No  Dec” 


Figure  47.  Out-of-library  discriminator  added  to  a  two-sensor  notional  ATR  sys¬ 
tem  evaluating  observations  through  time  t  =  T.  Given  ten  in-library 
classes,  classifier  outputs  are  10-dimensional  class  posterior  probabil¬ 
ities.  The  out-of-library  discriminator  assigns  an  11th  posterior  as  a 
function  of  the  10  in- library  posteriors. 


Given  the  10-D  in-class  posterior  probability  vector 


XpOSt  —  [PPtOD  PPoHI  PPoH‘2  PPl’' ns]  ; 

the  discrimination  function  sorts  the  posteriors  in  descending  order  producing  xor(j . 
Assuming  that  the  classifier  identifies  in-library  targets  well,  a  small  subset  of  the 
class  posteriors  are  significantly  larger  than  the  remaining  posteriors.  In-library /out- 
of-library  discrimination  results  from  a  threshold  setting  based  on  the  sum  of  a  subset 
of  ordered  posteriors. 

Two  threshold  parameters  are  chosen  through  a  nearly  blind  sub-optimization 
routine.  The  parameters  are  the  number  of  ordered  (largest  to  smallest)  posteriors 
over  which  to  sum  #ool,  and  the  threshold  against  which  the  sum  is  compared  @ooL. 
The  parameter  values  are  chosen  to  ensure  a  minimum  discrimination  performance 
given  in-library  and  out-of-library  records,  hence  the  nearly  blind  description.  For 
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(2) 

example,  the  sub-optimization  routine  may  determine  a  threshold  0qol  based  on  the 
sum  of  the  second  through  sixth  ordered  posteriors,  0qol  =  6.  Thus  given  xorc^  for  a 
sample  test  record,  the  discriminator  compares 


q(l) 


xool  ^  ^  xord(0> 


i= 2 


i.e.,  the  sum  of  the  second  through  sixth  ordered  posteriors  for  the  test  record  with 

^2) 

the  threshold  9ool,  and  it  assigns  the  OOL  posterior  as  a  function  of  distance  from 
the  threshold 


PPoOL  — 


0  if  Xool  <  0qOL 

/(X ool  -  ^OOl)  if  Xool  >  ^OOL 


If  x00j  <  0 (j q i ,  then  the  record  is  considered  an  in- library  class  and  the  posterior 

(2) 

probability  for  OOL  is  set  to  zero.  If  x00i  >  £?ool>  then  the  record  is  an  out-of- library 
record  and  the  posterior  probability  for  OOL  is  set  to  a  monotonically-increasing 
function  of  the  distance  from  the  threshold: 


m  =  TT^, 

where  d  =  x00j  —  9qol-  Since  x00i  e  [0,9/10]  and  9qol  £  [0,1],  d  e  [0,9/10] 
and  /  maps  d  to  [1, 1.999],  where  ppQ OL  is  concatenated  to  the  end  of  the  10-element 
estimated  posterior  vector  XpOSt  and  normalized  to  produce  the  estimated  11-element 
posterior  probability  vector. 


4-3.3  Formulation 

Laine  [15]  lets  x  be  a  vector  of  decision  variables  defined  in  the  MP  formulation. 
Some  decision  variables  such  as  the  fusion  indicator  variable  are  discrete,  while  others 
are  continuous.  The  MP  formulation  seeks  to  find  the  optimal  x  in  the  space  of  the 
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discrete  and  continuous  decision  variables  given  an  objection  function  and  limiting 
constraints. 

The  structure  of  the  MP  formulation  is  flexible  and  can  adapt  to  various  objec¬ 
tive  functions  and  constraints  per  the  goals  of  the  warfighter  or  CID  system  analyst. 
What  follows  are  example  formulations  given  the  previous  discussion. 
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Objective  Function 


rm>'D Ptp(%) 

max  IPR(x)  =  - - - 

xex  num  looks 


maximize  true-positive  rate 


Subject  to: 

Warfighter  constraints 


^crit 

< 

ib 

upper  bound  on  critical  errors 

^ncrit 

< 

n2 

upper  bound  on  non-critical  errors 

Ptp 

> 

n3 

lower  bound  on  true-positive  performance 

^dec 

> 

n4 

lower  bound  on  declaration  performance 

^ool 

> 

n5 

lower  bound  on  out-of-library  performance 

Fusion  rule  constraint 


5>  =  i 


2=1 


select  a  single  fusion  rule 

1  if  ith  fusion  rule  used 


where  F*  = 


0  otherwise 


Sensor  selection  constraint 


3= 1 

S 

3= 1 


select  from  available  sensors 


select  at  least  one  sensor 


where  Sj  = 


1  if  jth  sensor  used 
0  otherwise 


(64) 


111 


Threshold  constraints 


O'3  >  0  lower  threshold  constraint 

O'3  <  1  upper  threshold  constraint 

where  6 13  is  the  decision  threshold  associ¬ 
ated  with  fusion  rule  i  and  sensor  j.  The 
decision  threshold  may  be  0roc  or  $rej- 

Laine  shows  how  budgetary  constraints  could  be  developed  by  applying  a  cost 
function  to  the  research  and  development,  procurement,  and  maintenance  of  fu¬ 
sion  systems  and  sensors.  Physical  constraints,  such  as  weight,  space,  and  electro¬ 
magnetic  spectrum  are  also  possible  but  not  considered  here. 
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5.  Application  of  extended  CID  framework 

In  this  chapter,  the  extended  CID  framework  is  exercised  in  a  classification  exper¬ 
iment  using  DCS  radar  data.  This  experiment  competes  an  HMM-based  classifier 
against  a  template-based  classifier  across  a  variety  of  experimental  settings. 

The  chapter  has  the  following  sections:  an  introduction  to  the  experiment,  a 
description  of  the  experiment  data,  classifier  definitions,  the  experimental  methodol- 
ogy,  optimization  framework,  and  experimental  results.  The  results  section  is  further 
expanded  to  include  post-optimality  analysis  with  the  implementation  of  a  designed 
experiment. 

5. 1  Introduction 

The  goal  of  this  chapter  is  to  apply  the  extended  CID  optimization  framework 
in  an  experiment  using  observation  data  of  ground  targets  collected  from  an  airborne 
sensor.  Two  different  classifiers  compete  within  the  framework  across  a  variety  of 
experimental  settings.  Among  these  settings  are: 

•  warfighter  constraints  such  as  minimum  critical  error  rate 

•  threshold  settings  for  the  classifiers  (ROC  and  rejection  region  thresholds) 

•  fusion  methodology  (no  fusion,  mean  fusion  rule,  neural  network  fusion,  and 

boolean  fusion) 

•  level  of  sensor  independence 

•  observation  sequence  length 

Figure  48  provides  an  overview  of  the  ATR  system.  A  region  of  interest  is 
observed  in  time  by  two  sensors.  These  sensors  may  be  co-located  on  the  same 
platform  or  located  on  separate  platforms.  The  sensor  data  is  processed  into  features 
which  are  then  used  to  classify  a  target  into  one  of  four  classes:  target-of-the-day 
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classifier 


fusion 


Data  Features  Labels  Decision 


Figure  48.  Overview  of  ATR  system  with  two  sensors  sending  observations  through 
time  t  =  T  to  two  classifiers  whose  outputs  are  fused  into  one  of  five 
labels:  Target-of-the-day  (TOD),  Other  hostile  (OH),  Friend/Neutral 
(FN),  Out-of-library  (OOL),  or  Non-declaration. 

(TOD),  other  hostile  (OH),  friend/neutral  (FN),  or  out-of-library  (OOL).  Should  the 
classifier  not  have  enough  confidence  (determined  by  thresholds)  to  label  a  target  as 
belonging  to  one  of  the  four  classes,  it  applies  a  non-declare  label. 

5.2  Data  description 

The  data  set  used  in  this  experiment  was  collected  May  2004  at  Eglin  Air  Force 
Base,  Florida.  The  AFRL  Sensor  Data  Management  System  released  the  data  in  re¬ 
sponse  to  a  data  request  through  the  website  https :  //www .  sdms .  af  rl .  af  .mil.  The 
collection  used  a  General  Dynamics  DCS  X-band  synthetic  aperture  (SAR)  radar 
operating  in  spotlight  mode  aboard  a  medium-sized,  twin-engined  Convair  580.  The 
radar  bandwidth  was  640  MHz  with  a  peak  transmit  power  of  4  kW.  The  DCS  radar 
imagery  was  collected  at  a  resolution  of  1.0  ft  in  two  channels;  HH-polarization  and 
VV-polarization.  All  targets  were  stationary  and  imaged  in  an  open  area  without 
concealment  using  a  spotlight  mode,  and  SAR  chips  of  individual  targets  were  ex¬ 
tracted  from  full  spotlight  scenes.  The  SAR  chips  used  in  this  experiment  had  256 
x  256  pixels. 
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Table  9.  DCS  Collection  target  list  by  class  with  description  and  experiment  labels. 


Group 

Type 

Target  description 

Tracks  Wheels 

Gun  Label 

SCUD 

Single  Large  Missile 

N 

8 

N 

TOD 

SMERCH 

MLRS  Scud  Confuser 

N 

8 

N 

OH1 

Hostile 

SA-6  Radar 

Similar  to  SA-6  TEL 

Y 

0 

N 

OH2 

T-72 

Main  Battle  Tank 

Y 

0 

Y 

OH3 

SA-6  TEL 

3  Medium  SAMs 

Y 

0 

N 

OH4 

Zil-131 

Medium  Budget  Truck 

N 

4 

N 

FN1 

Friend  and 

HMMWY 

Jeep  like  SUV 

N 

4 

N 

FN2 

Neutral 

M113 

Armored  Personnel  Carrier 

Y 

0 

Y 

FN3 

Zil-131 

Small  Budget  Truck 

N 

4 

N 

FN4 

M35 

Large  Budget  Truck 

N 

4 

N 

FN5 

SA-8  TZM 

SA-8  Reload  vehicle 

N 

6 

N 

OOL1 

Out 

BMP-1 

tank  w/small  turret 

Y 

0 

Y 

OOL2 

of 

BTR-70 

8-wheeled  transport 

N 

8 

N 

OOL3 

Library 

SA-13 

turret  SAMs 

Y 

0 

N 

OOL4 

SA-8  TEL 

integrated  radar  exposed  SAMS 

N 

6 

N 

OOL5 

The  DCS  collection  consists  of  two-dimensional  X-band  SAR  imagery.  Table  9 
lists  the  15  targets  contained  in  the  collection.  Ten  targets  are  in-library  targets.  The 
classifiers  are  trained  using  feature  data  from  these  targets.  The  in-library  targets  are 
grouped  into  two  classes,  hostile  and  friend/neutral.  The  SCUD  is  labeled  “target- 
of-the-day”  (TOD)  and  is  the  focus  of  the  ATR  system.  The  remaining  four  hostile 
targets  are  labeled  “other  hostile”  (OH).  The  five  friend/neutral  target  types  are 
labeled  FN. 

Five  target  types  (SA-8  TZM,  BMP-1,  BTR-70,  SA-13,  and  SA-8  TEL)  are 
grouped  into  the  out-of-library  class.  The  signatures  of  these  target  types  are  not 
used  to  train  the  classifiers  and  are  labeled  OOL. 

The  DCS  radar  data  is  collected  using  HH  and  VV  polarizations.  In  the  two¬ 
sensor  experiment  described  above  sensor  1  uses  HH-polarized  data  and  sensor  2 
uses  VV-polarized  data. 
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Training  and  test  data  are  segregated  by  depression  angle  from  the  airborne 
sensor  to  the  ground  vehicles  at  the  time  of  collection.  Flight  passes  at  approxi¬ 
mately  3000  and  4000  ft.  correspond  to  sensor  depression  angles  of  6  and  8  degrees 
respectively.  Data  from  these  flight  passes  constitute  training  data.  Table  10  lists 
the  flight  passes  used  for  training  data.  Each  flight  pass  images  the  complete  target 
set  across  90  degrees  of  aspect  angle.  Multiple  flight  passes  provide  imagery  across 
360  degrees  of  target  aspect  angle. 

Test  data  is  collected  at  a  depression  angle  of  10  degrees  to  form  an  extended 
operating  condition  (EOC)  relative  to  the  training  data.  Flight  passes  corresponding 
to  10  degrees  of  depression  angle  are  made  at  approximately  5000  ft.  of  elevation. 
Table  11  lists  the  flight  passes  used  to  form  the  test  set. 

5.2.1  Features 

Once  grouped  into  sets  according  to  sensor  (polarization,  either  HH  or  VV), 
and  training/test  (6  and  8°  depression  angle  for  training  and  10°  depression  angle 
for  test),  the  SAR  chips  are  processed  into  HRR  profiles  per  the  steps  outlined  in 
Section  2.2.6.  These  steps  include: 

•  remove  of  Taylor  windowing  and  oversampling  in  the  DCS  SAR  chip 

•  apply  inverse  2-D  FFT 

•  convert  to  range  domain 

•  form  a  mean  HRR  profile 

Each  256  x  256  pixel  SAR  chip  is  processed  into  a  322-bin  mean  HRR  profile. 
Features  are  extracted  from  each  profile  according  to  a  maximum-value-within-bin- 
window  rule.  Each  HRR  profile  is  divided  into  10  bin  windows  near  the  center  of 
the  profile  as  shown  in  Fig.  49.  These  windows  are  defined  by  HRR  bin  ranges 
as  follows:  103-114,  115-126,  127-138,  139-150,  151-162,  163-174,  175-186,  187-198, 
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Table  10.  Data  Selected  for  Training  with  a  Desired  Depression  Angle  of  6  or  8 
Degrees 


Number  Flight  Pass  Identifier  Chips  Looks  per  vehicle  Desired  dep  angle 


1 

1 

10 

FP0110 

690 

46 

6 

2 

1 

11 

FP0111 

660 

44 

6 

3 

1 

12 

FP0112 

660 

44 

6 

4 

1 

13 

FP0113 

660 

44 

6 

5 

1 

15 

FP0115 

690 

46 

8 

6 

1 

16 

FP0116 

690 

46 

8 

7 

1 

17 

FP0117 

690 

46 

8 

8 

1 

18 

FP0118 

690 

46 

8 

9 

1 

34 

FP0134 

690 

46 

8 

10 

2 

12 

FP0212 

660 

44 

6 

11 

2 

13 

FP0213 

660 

44 

6 

12 

2 

14 

FP0214 

690 

46 

6 

13 

2 

16 

FP0216 

690 

46 

8 

14 

2 

17 

FP0217 

690 

46 

8 

15 

2 

18 

FP0218 

690 

46 

8 

16 

2 

19 

FP0219 

690 

46 

8 

17 

2 

32 

FP0232 

660 

44 

6 

18 

2 

33 

FP0233 

660 

44 

6 

19 

2 

34 

FP0234 

690 

46 

6 

20 

2 

35 

FP0235 

660 

44 

6 

21 

2 

36 

FP0236 

660 

44 

6 

22 

2 

37 

FP0237 

660 

44 

6 

23 

2 

38 

FP0238 

690 

46 

6 

24 

2 

39 

FP0239 

660 

44 

6 

25 

3 

6 

FP0306 

660 

44 

6 

26 

3 

7 

FP0307 

690 

46 

6 

27 

3 

8 

FP0308 

690 

46 

6 

28 

3 

9 

FP0309 

690 

46 

6 

29 

3 

11 

FP0311 

690 

46 

8 

30 

3 

12 

FP0312 

690 

46 

8 

31 

3 

13 

FP0313 

690 

46 

8 

32 

3 

14 

FP0314 

690 

46 

8 

mim  looks  per  vehicle  1448 

HH  looks  per  vehicle  724 

VV  looks  per  vehicle  724 

Total  number  of  chips  processed  21720 
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Table  11.  Data  Selected  for  Test  with  a  Desired  Depression  Angle  of  10  Degrees 


Number  Flight  Pass  Identifier  Chips  Looks  per  vehicle  Desired  dep  angle 


1 

1 

20 

FP0120 

660 

44 

10 

2 

1 

22 

FP0122 

660 

44 

10 

3 

1 

23 

FP0123 

690 

46 

10 

4 

1 

25 

FP0125 

690 

46 

10 

5 

2 

21 

FP0221 

660 

44 

10 

6 

2 

23 

FP0223 

660 

44 

10 

7 

2 

24 

FP0224 

660 

44 

10 

8 

2 

26 

FP0226 

660 

44 

10 

9 

3 

16 

FP0316 

660 

44 

10 

10 

3 

18 

FP0318 

660 

44 

10 

11 

3 

19 

FP0319 

660 

44 

10 

12 

3 

21 

FP0321 

660 

44 

10 

13 

3 

28 

FP0328 

690 

46 

10 

14 

3 

29 

FP0329 

660 

44 

10 

15 

3 

31 

FP0331 

660 

44 

10 

16 

3 

32 

FP0332 

660 

44 

10 

17 

3 

33 

FP0333 

660 

44 

10 

18 

3 

34 

FP0334 

690 

46 

10 

19 

3 

35 

FP0335 

690 

46 

10 

20 

3 

36 

FP0336 

690 

46 

10 

mim  looks  per  vehicle  892 

HH  looks  per  vehicle  446 

VV  looks  per  vehicle  446 

Total  number  of  chips  processed  13380 
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SAR  chip 


Figure  49.  A  SAR  chip  of  a  target  at  a  specific  sensor-target  orientation  is  pro¬ 
cessed  into  an  HRR  profile.  Features  are  derived  from  the  profile  by 
taking  the  maximum  value  within  10  range  bin  windows. 

199-210,  and  211-222.  The  maximum  value  within  each  of  the  10  bin  windows  is 
saved  as  a  feature.  Thus  dimensionality  of  each  SAR  chip  is  reduced  from  256  x  256 
to  10. 

The  HRR  feature  data  is  then  ordered  by  aspect  angle.  Each  10-dimensional 
feature  vector  is  derived  from  a  target  SAR  chip  and  collected  at  a  specific  sensor- 
target  orientation.  This  orientation  includes  both  the  depression  angle  from  the 
airborne  sensor  to  the  ground  vehicle  and  the  relative  aspect  angle  of  the  vehicle 
to  the  sensor  line-of-sight  in  the  horizontal  plane.  Variation  in  the  depression  angle 
separates  the  training  and  test  data,  and  variation  in  the  aspect  angle  determines  the 
target  pose.  Observing  a  sequence  of  ordered  target  poses  mimics  a  moving  target 
or  a  stationary  target  and  a  moving  sensor. 

For  a  given  target  type  the  training  data  consists  of  724  SAR  chips  processed 
into  724  HRR  feature  vectors.  The  10-dimensional  feature  vectors  are  ordered  by 
increasing  aspect  angle  (from  1  to  360  degrees)  and  interpolated  at  0.5  degree  to 
form  the  complete  training  feature  data.  Figure  50  shows  the  feature  data  from 
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the  training  data  set  for  the  10  in-library  target  types  with  hostiles  on  the  left  and 
friend/neutrals  on  the  right. 

Each  subplot  corresponds  to  a  specific  target  type  and  displays  the  720  in¬ 
terpolated  10-D  HRR  feature  vectors  in  order  of  increasing  target  azimuth  (aspect 
angle),  where  lighter  colors  correspond  to  greater  magnitude  and  variation  within  a 
target  as  azimuth  changes  is  apparent.  Also,  differences  between  target  types  given 
a  360  degree  feature  space  representation  are  apparrent. 

The  time-series  classifier  used  in  this  experiment  acts  on  an  ordered  observation 
sequence  of  HRR  feature  vectors.  The  sequence  begins  at  a  random  starting  azimuth 
(aspect  angle)  and  covers  a  subset  of  the  360  degree  observations  seen  in  Fig.  50. 
Figure  51  shows  feature  data  for  the  5  out-of-library  target  types. 
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SA-6TEL  T-72  SA-6  Radar  SMERCH  SCUD 


> 


90  180  270  360 

Target  azimuth 


Figure  50.  HRR-based  feature  training  data  for  10  in-library  target  types. 
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Figure  51.  HRR-based  feature  test  data  for  5  out-of-library  target  types. 

5.3  Classifiers 


Section  5.2.1  describes  how  features  are  extracted  from  sensor  data.  Referring 
to  Fig.  48,  the  sensor  data  D  is  processed  into  feature  data  F  which  is  then  input  to 
a  classifier  c  for  labeling.  This  section  describes  the  two  types  of  classifiers  used  in 
the  ATR  system,  a  HMM-based  classifier  and  a  template-based  classifier. 


5.3.1  HMM-based  classifier 

The  HMM-based  classifier  follows  closely  the  development  of  the  multi-dimensional 
Gaussian  HMM  of  Section  3. 4. 4. 3.  For  each  target  type,  t  G  {1, 2,  3, . . . ,  10},  an 
HMM  At  is  trained  using  sequences  of  10-dimensional  feature  data.  There  are  two 
sets  of  HMMs  in  the  classifier.  One  set  classifies  feature  data  from  sensor  1  (HH- 
polarized  data)  and  is  written  A},  while  another  classifier  set  operates  on  the  VV- 
polarized  data  of  sensor  2,  A^ .  Thus  the  HMM-based  ATR  system  under  investigation 
employs  20  HMMs  operating  on  two  streams  of  10-dimensional  time-series  data. 
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A  hidden  Markov  model  A  is  parameterized  by  the  hidden  Markov  chain  tran¬ 
sition  matrix  A  an  observation  distribution  matrix  B  and  an  initial  state  probability 
vector  7 r.  Design  of  an  HMM  includes  several  decisions  regarding  its  topology.  Of 
critical  importance  is  the  number  of  hidden  states  in  the  Markov  chain,  called  the 
order  of  the  HMM.  Given  S  states,  the  transition  matrix  is  S  x  S.  Thus,  the  number 
of  parameters  in  the  HMM  increases  exponentially  with  S. 

The  HMMs  used  in  this  experiment  are  of  order  90.  The  state  space  is  not 
fully  connected.  To  reduce  the  number  of  parameters  and  to  more  closely  model  the 
relationship  between  observations  sequenced  by  target  aspect  angle  and  the  obser¬ 
vation  distributions  of  each  hidden  state,  the  HMM  uses  a  left-right  state  space  (see 
Fig.  37).  In  a  left-right  model  the  Markov  chain  may  remain  in  the  same  state  or 
advance  to  the  adjacent  state  (to  the  right)  at  each  discrete  time  step.  The  state 
transition  matrix  A  has  entries  on  the  main  and  first  diagonal  with  zeros  elsewhere, 
reducing  the  non-zero  parameters  of  A  from  S2  to  2 S. 

Another  topology  decision  is  the  modeling  of  the  observation  space.  The  obser¬ 
vation  space  for  this  experiment  is  the  10-dimensional  HRR  feature  vector  derived 
from  the  HRR  profile.  A  Gaussian  HMM  assumes  the  observation  space  is  dis¬ 
tributed  multi-variate  normal,  where  B}  contains  the  parameter  pair  ji  and  S  for 
the  multi-variate  Gaussian  associated  with  each  hidden  state  for  the  HMM  associ¬ 
ated  with  target  t  and  sensor  1  and  where  p  is  a  10-d  mean  vector  and  S  is  the 
10-d  covariance  matrix.  Since  the  HMM  has  90  hidden  states,  Blf  is  a  2  x  90  array, 
where  the  elements  of  the  first  column  are  the  mean  vector  and  covariance  matrix 
associated  with  multi-variate  Gaussian  observation  space  of  the  first  hidden  state 
and  where  B 2  determines  the  observation  distributions  for  the  HMMs  associated 
with  sensor  2  data. 
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Prior  to  training  using  the  Baum- Welch  re-estimation  algorithm,  each  model 
is  given  an  initial  parameterization  such  that 

A  ;  =  (A-0,BS,w), 

where  Af  is  the  HMM  for  target  t  and  sensor  s,  Aq  is  the  initial  state  transition 
matrix,  Bq  is  the  initial  observation  distribution  array,  and  ir  is  the  initial  state 
distribution  vector. 

Initialization  of  the  state  transition  matrix  Af}  makes  use  of  the  left-right  model 
paradigm.  The  entries  along  the  main  and  first  diagonal  are  set  to  0.5  and  all  other 
entries  are  set  to  0,  where  is  a  stochastic  matrix  with  rows  summing  to  0. 

The  initial  parameters  for  the  observation  distributions  are  also  linked  to  the 
left-right  model.  As  shown  in  Fig.  52  each  hidden  state  observation  distribution 
covers  a  window  of  target  aspect  angle.  The  elements  of  the  initial  observation 
distribution  array  Bq  are  found  by  determining  the  sample  mean  and  covariance  of 
the  feature  vectors  within  the  aspect  window  of  each  hidden  state.  Because  each  \st 
has  90  states  and  the  feature  space  includes  observations  from  1  to  360  degrees  of 
aspect  angle,  each  state  covers  an  aspect  window  of  4  degrees. 

Given  Fts ,  the  10-dimensional  feature  data  for  target  t  and  sensor  s,  the  first 
column  of  entries  of  the  observation  distribution  array  Bq  correspond  to  the  mean 
vector  and  covariance  matrix  of  the  feature  data  from  aspect  angle  1  through  4  and 
are  associated  with  the  first  hidden  state 

/Ui  mean(F/( :  ,  1:4)) 

_  _  _  cov(Fts(:  ,1:4))  J 


Bs0(:,l)  = 
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Figure  52.  Multi-dimensional  observations  in  the  feature  space  are  linked  to  the 
observation  distributions  of  the  hidden  states.  Note:  features  of  dimen¬ 
sion  7  are  shown  here;  the  DCS  experiment  uses  features  of  dimension 
10. 

where  the  notation  indicates  all  elements  of  the  specified  dimension  of  the  array. 
The  last  column  of  £>(]  is  associated  with  the  OO^1  hidden  state  and  is 


Bs0(:,  90)  = 


^90 

mean(F/( :  ,  357:360)) 

1 

M 

O 

O 

co v(F/(:  ,  357:360)) 

The  initial  state  distribution  allows  the  user  to  control  the  starting  state  of 
the  Markov  chain.  As  described  above,  the  observations  are  ordered  by  aspect  angle 
beginning  at  1  degree  and  ending  with  360  degrees.  The  observations  are  a  function  of 
the  aspect  angle  of  the  target;  that  is,  when  viewed  from  a  certain  aspect  window,  the 
observations  come  from  a  specific  state  in  the  hidden  process.  Given  an  observation 
sequence  that  begins  at  a  target  aspect  angle  of  1  degree,  the  model  can  be  forced  to 
start  in  state  1  by  setting  the  first  element  of  n  to  1  with  zeros  elsewhere.  Therefore, 
independent  of  target  type  t  and  sensor  s  the  initial  state  distribution  for  the  hidden 
state  space  always  begins  in  state  1, 

f  1  for  i  =  l 

=  < 

I  0  otherwise 

where  i  is  the  hidden  state  number. 
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With  the  HMMs  initialized  as  described  above,  training  ensues  using  the  Baum- 
Welch  re-estimation  algorithm  of  Sec.  2. 1.3. 6,  whereby  the  initial  parameters  of  the 
HMMs  are  iteratively  updated  until  a  threshold  is  reached.  Training  records  consist 
of  the  entire  10- dimensional  feature  data  Ffs  for  target  t  and  sensor  s.  The  train¬ 
ing  records  begin  with  feature  data  at  aspect  angle  1,  progress  through  the  aspect 
window,  and  end  at  aspect  angle  360.  Once  trained,  each  HMM  (Xf)  is  ready  to  be 
employed  as  a  classifier  in  the  experiment. 

5.3.2  Template-based  classifier 

A  competitor  classifier  using  templates  is  described  in  this  section.  The  classi¬ 
fier  follows  Laine  [15],  Meyer  [60],  and  Duda  et  al.  [20]  in  using  Mahalanobis  distance 
as  a  classification  measure.  Mahalanobis  distance  is 

A 2  =  (/r  —  x)T£-1(/i  —  x),  (65) 

where  p  is  the  population  mean,  T  denotes  matrix  transpose,  E_1  is  the  inverse 
covariance  matrix  of  the  population,  and  x  is  the  test  vector  whose  distance  (squared) 
from  the  population  is  A2. 

Templates  are  formed  using  the  10- dimensional  feature  data  for  each  target 
t  and  each  sensor  s.  The  feature  data  is  divided  into  24  wedges  of  15  degrees, 
each  covering  the  entire  360  degree  aspect  of  the  feature  data.  A  sample  mean  and 
covariance  are  taken  from  each  of  the  24  wedges  forming  a  template  array  Tfs  for 
each  target  type  t  and  each  sensor  s.  Descriptive  statistics  for  the  wedges  are  used 
to  define  the  populations  in  the  calculation  of  Mahalanobis  distance. 

The  elements  of  the  first  column  of  the  template  array  are  determined  by 
finding  the  mean  and  covariance  of  the  feature  vectors  within  the  first  aspect  wedge 
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(1  through  15  degrees  of  aspect  angle) 


mean(F/( :  ,  1: 15)) 
cov(F/(: ,  1:15)) 

With  the  template  arrays  formed,  each  Tts  is  ready  to  be  employed  as  a  classifier  in 
the  experiment. 

5.4  Methodology 

At  the  heart  of  the  extended  CID  optimization  framework  is  the  ATR  sys¬ 
tem  which  labels  observation  sequences  of  unknown  target  type  with  one  of  five 
labels:  target-of-the-day  (TOD),  other  hostile  (OH),  friend/neutral  (FN),  out-of- 
library  (OOL),  or  non-declare  (Non-dec).  Table  9  lists  the  15  target  types  used  in 
the  experiment;  10  in- library  types  and  5  out-of-library  types.  This  section  describes 
the  process  of  classification  given  trained  HMM  and  template  classifiers. 

An  HMM-based  classifier  consists  of  20  models  Xf  ,  where  t  =  1,  2, . . . ,  10  (in- 
library  target  types)  and  s  =  1,2  (sensors).  Given  a  target  under  test,  each  sensor 
produces  an  observation  sequence  through  a  specific  wedge  of  aspect  angle.  The 
observation  sequences  are  processed  into  sequences  of  features.  Feature  sequences 
from  sensor  1  are  evaluated  by  the  sensor  1  HMMs,  Aj,  and  sensor  2  sequences  are 
evaluated  by  sensor  2  HMMs,  A^ .  Classifier  outputs  are  post-processed  according  to 
fusion  rule  and  out-of- library  discriminator  before  label  thresholding  occurs. 

A  similar  process  unfolds  for  the  template-based  classifier.  The  parameterized 
template  arrays  Tts  are  used  to  find  a  minimum  Mahalanobis  distance  across  target 
type  when  given  test  data.  Classifier  outputs  are  post-processed  according  to  a 
fusion  rule,  an  out-of-library  assessment  is  made,  and  finally  a  label  is  assigned  as  a 
function  of  ROC  and  rejection  thresholding. 


T?(:,  1)  = 


hi 

Sx 
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Figure  53.  Experimental  flowchart  with  HMM-based  classifiers. 


Figure  53  provides  an  experiment  flowchart  for  the  HMM  case.  The  template 
case  is  similar  with  the  exception  of  the  classifier  training  routine.  The  following 
sections  provide  details  of  the  classification  methodology  for  the  HMM-based  and 
template-based  classifiers. 


5-4- 1  Test  sequence  generation 

As  described  in  Sec.  5.2,  the  DCS  data  are  segregated  into  training  and  test 
data  as  a  function  of  depression  angle.  The  data  used  to  train  the  classifiers  are 
collected  at  a  depression  angle  of  6  and  8  degrees,  while  the  test  data  are  collected 
at  10  degrees.  Test  sequences  are  drawn  from  the  ordered  10-dimensional  feature 
data  resulting  from  processing  SAR  chips  into  HRR  profiles  and  then  applying  the 
maximum- value- within-bin- windows  rule  of  Sec.  5.2.1.  The  same  notation  is  used  to 
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denote  testing  feature  data  as  is  used  for  training  data,  i.e.,  Fts,  but  t  =  1,  2, ...  ,15 
indexes  both  in-library  and  out-of-library  target  types  and  s  indexes  sensors  with 
the  understanding  that  while  the  notation  is  the  same,  the  feature  data  is  not. 

One  hundred  test  records  are  generated  for  each  in-library  target  type  (10) 
and  twenty  test  records  are  generated  for  each  out-of- library  type  (5)  for  a  total  of 
1100  test  records.  Let  Yts  be  an  array  containing  the  test  sequences  from  target 
t  =  1,2,...,  15  and  sensor  s.  Because  the  interest  is  in  time-series  classification, 
each  test  record  is  an  ordered  sequence  of  feature  observations.  Each  test  record 
begins  at  a  randomly  chosen  aspect  angle  and  includes  a  pre-determined  number  of 
observations.  For  example,  if  the  observation  length  is  10  degrees  then  each  test 
record  y  G  Yts  begins  at  a  randomly  selected  starting  aspect  angle  and  covers  an 
aspect  window  of  10  degrees.  Thus  each  y  is  a  subset  of  the  full  aspect  feature  data 
for  target  t  and  sensor  s.  Each  Yts  contains  test  sequence  data  and  is  presented  to 
both  HMM-  and  template-based  classifiers  for  classification. 

5.4-2  Classifier  testing 

Test  records  l)s  are  presented  to  the  HMM-based  and  template-based  classifiers 
differently.  The  HMM-based  classifiers  A)  are  given  the  sequenced  test  data  contained 
in  Yts.  The  methodology  used  in  this  experiment  presents  all  1100  sensor  1  test 
records  {Y^  for  t  =  1,  2, . . . ,  15)  to  each  sensor  1  HMM  (A)  for  t  =  1,  2, . . . ,  10).  Each 
record  is  evaluated  by  each  HMM  for  a  total  of  11,000  evaluations.  The  process  is 
repeated  for  sensor  2  data. 

Each  test  record  is  evaluated  by  10  target-specific  HMMs,  producing  10  log- 
likelihoods  per  the  calculations  of  Sec.  2. 1.3. 4.  Class  membership  can  be  assigned  at 
this  point  by  choosing  the  model  associated  with  the  greatest  log-likelihood  among 
the  10.  In  this  experiment,  assignment  of  class  membership  is  delayed  until  classifier 
output  from  both  sensors  is  fused. 
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The  template-based  classifier  is  not  a  time-series  classifier  in  the  same  sense  as 
the  HMM.  Instead  of  taking  a  sequence  of  observation  data  as  input,  the  template- 
based  classifier  takes  a  vector  as  input  to  the  Mahalanobis  distance  calculation. 
Given  a  test  record  y  E  Yts  that  covers  an  aspect  window  of  10  degrees,  the  test 
vector  x  is  formed  by  finding  the  mean  of  y.  As  in  the  HMM  case,  there  are  1100 
test  records  (vectors)  in  the  template  case. 

Each  test  vector  is  used  to  find  the  Mahalanobis  distance  from  each  15  degree 
aspect  wedge  of  each  target: 


A^Ou-xfE-V-x),  (66) 

where  T)s  contains  the  //  and  E  for  each  aspect  wedge  of  target  t  and  sensor  s. 

The  smallest  Mahalanobis  distance  A(nm  across  the  24  aspect  wedges  for  each 
target  t  are  collected  for  each  test  record.  Class  membership  can  be  assigned  at  this 
point  by  choosing  the  template  associated  with  the  smallest  Mahalanobis  distance 
among  the  10.  In  this  experiment,  assignment  of  class  membership  is  delayed  until 
classifier  output  from  both  sensors  is  fused. 

5-4-3  Classifier  post-processing 

Post-processing  covers  the  steps  from  classifier  output  to  classifier  labeling. 
Included  in  these  steps  are 

•  Conversion  from  classifier  output  to  estimated  posterior  probabilities  given  a 
test  record 

•  Discrimination  between  in-library  and  out-of-library  records  given  an  estimated 
posterior  probability  of  membership  in  the  10  in-library  target  types 

•  Labeling  of  class  as  a  function  of  the  ROC  and  rejection  thresholds 
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The  post-processing  steps  are  described  for  the  simple  case  of  a  single  sen¬ 
sor  without  fusion.  The  HMM-based  classifier  system  uses  10  models  A ]  t  = 
1,2,...,  10,  trained  with  feature  data  derived  from  sensor  1  (HH  polarized  SAR 
chips).  Given  a  test  record  y,  the  10  models  output  10  log- likelihoods.  The  log- 
likelihoods  are  exponentiated  and  normalized  to  produce  an  estimated  posterior 
probability  for  each  target  type  ppt  for  t  =  1,  2, . . . ,  10. 


In  the  template-based  classifier  case,  the  min-distance  results  A)nm  are  mapped 
to  a  [0,  1]  interval  using 


1 


zt  = 


=— 1/2A  ? 


(67) 


and  are  then  normalized  across  the  10  in-library  target  types  into  posterior  estimates 


PPt  = 


for  t  =  1,2, ,  10. 


(68) 


Whether  derived  from  HMM  or  template  outputs,  the  posterior  estimates  ppt 
are  processed  by  an  in- library/out-of- library  discriminator  in  the  same  fashion.  The 
purpose  of  the  discriminator  is  to  assign  an  11  ^  posterior  as  a  function  of  the  10 
posteriors  contained  in  ppt.  The  ll^1  posterior  determines  the  estimated  probability 
of  membership  in  the  out-of-library  class. 

Given  the  10-D  in-class  posterior  probability  vector 


XpOSt  —  \PPtOD  PPo  H I  PPoH‘2  ■■■  M-t-'NE,]  ; 

the  discrimination  function  sorts  the  posteriors  in  descending  order,  producing  xor(j . 
Assuming  the  classifier  identifies  in-library  targets  well,  a  small  subset  of  the  class 
posteriors  is  significantly  larger  than  the  remaining  posteriors.  In- library /out-of- 
library  discrimination  results  from  a  threshold  setting  based  on  the  sum  of  a  subset 
of  ordered  posteriors. 
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Two  threshold  parameters  are  chosen  through  a  nearly  blind  sub-optimization 
routine.  The  parameters  are  the  number  of  ordered  (largest  to  smallest)  posteriors 
over  which  to  sum,  0oolj  and  the  threshold  0ool-  The  parameter  values  are  chosen  to 
ensure  some  minimum  discrimination  performance  given  in-library  and  out-of-library 
records,  hence  the  nearly  blind  description.  For  example,  the  sub-optimization  rou- 
tine  may  determine  a  threshold  0O()L  based  on  the  sum  of  the  second  through  sixth 
ordered  posteriors,  $ool  =  6.  Thus  given  xorcj  for  a  sample  test  record,  the  discrim¬ 
inator  compares 


xool  —  ^  ]  xord(d)> 


i= 2 

the  sum  of  the  second  through  sixth  ordered  posteriors  for  the  test  record,  with  the 
(2) 

threshold  #ool  and  assigns  the  OOL  posterior  as  a  function  of  the  distance  from  the 
threshold 


PPool  = 


0  if  xool  <  doc!,, 

/(XOol  _  $OOl)  if  Xool  —  ^OOL 


(2) 

If  x00j  <  dooL,  then  the  record  is  from  an  in-library  class  and  the  posterior  probability 
for  OOL  is  set  to  zero.  If  xOGi  >  dooL,  then  the  record  is  potentially  an  out-of- library 
record  and  the  posterior  probability  for  OOL  is  set  to  a  monotonically-increasing 
function  of  the  distance  from  the  threshold: 


•/  (//)  1  +  e~wd  ’ 

where  d  =  x00i  —  8odh.  Since  x00i  G  [0,9/10]  and  dodL  G  [0,1],  then  d  G  [0,9/10] 
and  /  maps  d  to  [1,1.999].  Finally  ppQ OL  is  concatenated  to  the  end  of  the  10- 
element  estimated  posterior  vector  Xp0S^  and  normalized  to  produce  the  estimated 
11-element  posterior  probability  vector. 

The  final  step  is  assigning  one  of  five  labels  to  the  test  record.  The  five  la¬ 
bels  (TOD,  OH,  FN,  OOL,  or  Non-declare)  are  assigned  as  a  function  of  the  11- 
dimensional  posterior  probability  vector  Xj)0S^  and  the  threshold  settings  dRoc  and 
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Labeling  and  MOP  Processes 


Figure  54.  Labeling  process  and  measures  of  performance  (MOP)  for  the  DCS 
experiment  as  a  function  of  #roc  and  #rej  thresholds. 

#rej,  which  define  the  rejection  region.  Figure  54  provides  a  roadmap  for  the  labeling 
process. 

As  shown  in  the  figure,  the  11-dimensional  posterior  probability  vector  Xj)OS^ 
is  converted  to  a  four-class  xcjass  and  a  two-class  xpf  posterior  vector  by  summing 
the  posteriors  related  to  the  four  true  target  classes  (TOD,  OH,  FN,  and  OOL),  and 
finally  separating  the  posteriors  into  two  super-classes  (H  =  TOD  +  OH,  and  FNO 
=  FN  +  OOL). 

A  rejection  region  is  determined  by  ^rdc  and  #rej-  The  two-class  posterior 
vector  xpf  is  adjudicated  with  the  rejection  region,  resulting  in  either  a  hostile  dec¬ 
laration,  a  friend/neutral/out-of- library  declaration,  or  a  “Non-declare”  label.  If  a 
hostile  declaration  is  made,  the  associated  four-class  posterior  vector  xc^ass  is  adjudi¬ 
cated  to  determine  whether  the  test  record  is  assigned  a  “TOD”  or  “OH”  label.  If  a 
friend/neutral/out-of- library  declaration  is  made,  the  associated  four-class  posterior 
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vector  xciass  is  adjudicated  to  determine  whether  the  test  record  is  assigned  a  “FN” 
or  “OOL”  label. 

Each  test  record  is  evaluated  at  a  specific  (&roc,  $rej)  setting.  A  confusion 
matrix  is  built  using  label  versus  truth  for  the  test  records  at  each  threshold  setting. 
Performance  measures  are  collected  per  the  calculations  of  Sec.  4.3.2. 

5-4-4  Fusion  methods 

The  DCS  experiment  makes  use  of  three  different  fusion  methods  to  combine 
the  classifier  outputs  of  the  two  sensors.  The  first  two  fusion  methods,  mean  fusion 
and  neural  network  fusion,  combine  the  classifier  outputs  prior  to  labeling.  Rejection 
region  thresholding  is  applied  to  the  fused  11-dimensional  posterior  vector  XpOSp  In 
the  third  case,  label  fusion,  each  classifier  output  is  adjudicated  by  the  rejection 
region  producing  one  of  five  labels  (TOD,  OH,  FN,  OOL,  or  Non-declare).  Two  sets 
of  labels  are  produced,  one  by  classifier  1  and  another  by  classifier  2.  The  labels  are 
fused  according  to  a  set  of  label  rules  that  map  all  possible  label  pairs  into  a  final 
fused  label.  This  section  examines  the  three  methods  of  fusion. 

Figure  55  shows  the  process  for  the  fusion  methods  with  the  simple  mean  fu¬ 
sion  rule  on  top.  In  the  HMM  case,  given  a  test  record,  each  set  of  sensor-specific 
HMMs  \st  produces  a  10-dimensional  vector  of  log- likelihoods.  The  mean  fusion 
rule  simply  finds  the  mean  of  the  two  10- dimensional  log-likelihood  vectors.  The 
10- dimensional  mean  vector  is  then  exponentiated  and  normalized  to  produce  a  10- 
dimensional  estimated  posterior  probability  vector.  After  adding  an  11^  posterior 
via  the  in-library /out-of- library  discriminator,  the  posterior  vector  xc^ass  is  adjudi¬ 
cated  according  to  the  rejection  region  thresholds,  producing  a  final  label. 

In  the  case  of  template  classifiers,  the  mean  fusion  rule  is  applied  to  the  10- 
dimensional  minimum  Mahalanobis  distance  vectors  A™m  associated  with  each  sen¬ 
sor.  Here  the  mean  of  the  two  min-distance  vectors  is  produced  by  the  mean  fusion 
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^  Mean  and  Neural  Fusion  ^ 


Sensor  1  (HH  polar)  Rejection  Window 


Target 


Sensor  1  (HH  polar) 


Sensor  2  (VV  polar) 


-►Classifier  2l—»J°Ustc'°,n 


Labels  from  Classifiers  1  and  2  are 
fused  using  a  set  of  label  rules: 


TOD  TOD 

OH  OH 

TOD  OH 

OH  TOD 

FN  FN 

FN  OOL 

OOL  FN 

OOL  OOL 

otherwise 


TOD 

OH 

OH 

OH 

FN 

FN 

FN 

OOL 

Non-declare 


Figure  55.  Fusion  methods 


rule.  The  mean  vector  is  then  mapped  to  the  interval  [0,  1]  and  normalized  into  a  10- 
dimensional  estimated  posterior  probability  vector.  After  adding  an  11^  posterior 
via  the  in- library/out-of- library  discriminator,  the  posterior  vector  is  adjudicated 
according  to  the  rejection  region  thresholds,  producing  a  final  label. 

The  neural  network  fusion  method  is  similar  to  the  mean  fusion  method.  It 
takes  10-dimensional  classifier  output  from  the  two  sensors  and  produces  a  single 
fused  10- dimensional  vector.  Instead  of  a  simple  mean  rule,  the  neural  network 
fusion  rule  employs  a  multi-layer  perceptron  neural  network  (MLPNN)  to  fuse  the 
two  sets  of  inputs. 

The  MLPNN  takes  an  input  vector  comprised  of  the  two  10- dimensional  clas¬ 
sifier  output  vectors  (either  log-likelihood  in  the  case  of  HMMs  or  min-distance  for 
the  template  case)  concatenated  to  form  a  vector  of  length  20.  The  trained  MLPNN 
then  maps  the  input  vector  to  a  10-dimensional  output  vector  whose  entries  are  in 
the  range  [0,  1]. 


135 


The  structure  of  the  MLPNN  used  in  the  experiment  has  20  input  nodes,  40 
hidden  layer  nodes,  and  10  output  nodes.  A  tansigmoid  transfer  function  is  used 
for  the  hidden  layer  while  a  logsigmoid  transfer  function  is  used  at  the  output  layer. 
The  input  data  is  pre-processed  to  the  range  [—1,  1]. 

Training  of  the  MLPNN  uses  sequences  from  the  training  data  set  (6  and 
8  degree  depression  angle)  to  produce  output  from  the  HMM  and  template-based 
classifiers.  These  outputs  are  used  as  training  input  for  the  MLPNNs.  The  inputs 
are  targeted  against  the  known  true-class  of  the  input  vectors.  MLPNN  training 
uses  a  gradient-descent  method  with  momentum  to  determine  network  weights  and 
biases. 

The  final  method  is  label  fusion.  As  mentioned  earlier,  the  label  method  com¬ 
bines  labels  instead  of  classifier  outputs.  Figure  55  shows  the  label  fusion  process 
and  includes  the  set  of  label  rules  used  in  the  experiment. 

The  threshold  space  used  in  the  label  fusion  rule  is  quadratically  larger  than  the 
other  fusion  rules.  This  result  follows  from  performing  rejection  region  adjudication 
for  each  classifier,  which  squares  the  number  of  threshold  settings. 

5.4-5  Prior  knowledge  of  target  aspect 

One  factor  influencing  classifier  performance  is  prior  knowledge  of  the  target 
aspect  angle.  In  the  case  of  the  template-based  classifier,  prior  knowledge  of  target 
aspect  angle  reduces  the  number  of  aspect  wedges  involved  in  the  minimum  Ma- 
halanobis  distance  calculation.  If  target  aspect  is  known  so  that  the  target  aspect 
angle  falls  within  a  specific  aspect  wedge,  then  the  min-distance  calculation  is  re¬ 
duced  from  24  wedges  per  target  to  1  wedge  per  target,  decreasing  the  chance  for 
classifier  error. 

In  Laine’s  research  [15],  prior  aspect  knowledge  is  assumed  to  be  within  ±  22.5° 
due  to  target  tracking  information  handoff  to  the  ATR.  Using  this  level  of  prior  target 
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Table  12.  Prior  aspect  distribution  for  HMM  ATR 


Number  HMM  States 

Aspect  Wedge 

Half-Width 

States  to  cover  ±22.5° 

90 

4°  per  state 

±2° 

11  states  (±22°) 

72 

5° 

±2.5° 

9  states  (±22.5°) 

60 

6° 

o 

CO 

-H 

8  states  (±24°) 

40 

9° 

±4.5° 

5  states  (±22.5°) 

30 

12° 

±6° 

4  states  (±24°) 

20 

18° 

±9° 

3  states  (±27°) 

10 

36° 

±18° 

1  state  (±18°) 

aspect  information  corresponds  to  3  aspect  wedges  (3  x  15°  =  45°)  in  the  case  of  the 
template-based  classifier.  Thus,  when  searching  for  the  min-distance  Mahalanobis 
measurement,  only  the  true  wedge  and  its  nearest  neighbor  on  either  side  are  used. 

For  the  HMM-based  classifier,  ensuring  a  specific  level  of  target  aspect  angle 
knowledge  is  more  problematic.  The  solution  makes  use  of  the  relationship  between 
hidden  states  and  aspect  angle  windows.  Table  12  lists  the  number  of  states  asso¬ 
ciated  with  a  ±  22.5°  aspect  window  given  that  there  are  S  states  in  the  Markov 
chain.  As  S  increases,  the  number  of  states  required  to  cover  the  aspect  window 
increases.  For  the  case  S  =  90,  11  states  are  required  to  cover  the  aspect  window. 

Given  a  test  sequence  that  begins  at  angle  a  and  an  HMM  with  90  hidden  states 
such  that  each  hidden  state  is  associated  with  an  aspect  window  of  4°,  a  corresponds 
to  a  specific  aspect  window  and  hence  a  specific  hidden  state  called  s*.  With  perfect 
prior  knowledge  of  target  aspect  angle  a,  the  HMM  prior  state  distribution  7r  used  in 
the  evaluation  of  the  test  sequence  sets  7r(s*)  =  1  and  0  elsewhere.  With  imperfect 
knowledge,  a  uniform  distribution  is  centered  on  i r(s*).  For  S  =  90  and  aspect 
knowledge  limited  to  ±  22.5°  the  uniform  distribution  covers  11  states  centered  on 

7r(s*). 

Analysis  of  the  raw  SAR  image  data  reveals  the  ±  22.5°  assumption  to  be 
achievable  through  simple  image  analysis.  Figure  56  shows  the  steps  used  in  this 
image  processing  analysis.  The  raw  data  is  the  collection  of  SAR  chips  of  a  specific 
target  type  (T-72  tank),  each  collected  at  a  specific  sensor-target  orientation.  The 
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Image  Processing 

Example  MSTAR  Chip 
T-72 


Filtered  real  data 


{0  -  3a  <  x  <  3c 

pixel  o.w 


Original  complex  SAR  chip 

Mask  processing 


-c 


Binary  Filter 
0  x  =  0  (real  and  im) 
x  *  0  (real  or  im) 


Initial  mask 

— U 


Filtered  imaginary  data 

Perimeter  Convex  Hull  Final  Mask 


Step  1:  dilate  Step  2:  erode  Step  3:  morph 
(majority) 


Step  4:  dilate  Step  5:  erode 


Figure  56.  SAR  chip  image  processing  flowchart 

SAR  chips  are  ordered  by  target  aspect  angle,  and  each  chip  is  processed  according 
to  the  following  steps: 


•  The  complex  SAR  chip  is  separated  into  its  component  real  and  imaginary 
parts. 

•  Each  sub-chip  is  filtered  according  to  descriptive  statics  of  the  pixel  values. 

•  A  binary  mask  is  created  according  to  a  boolean  rule  involving  the  filtered  real 
and  imaginary  sub-chips. 

•  The  initial  mask  is  manipulated  using  various  image  processing  routines. 

•  A  final  binary  mask  is  created. 

•  Principal  component  analysis  (PCA)  is  performed  on  the  binary  mask  to  yield 
a  major  axis  of  the  mask  which  is  used  to  estimate  the  target  aspect  angle. 

•  The  true  aspect  angle  is  compared  with  the  PCA-derived  estimate. 
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The  application  of  PCA  to  the  pose-estimation  problem  begins  with  the  binary 
target  mask  B  based  on  the  128  x  128  pixel  complex  SAR  chip.  Thus  B  is  a  128  x  128 
matrix  with  elements  bij  G  {0, 1}.  Next,  the  row  and  column  locations  of  the  elements 
of  B  where  btj  =  1  are  placed  in  6',  a.  n  x  2  matrix,  where  n  is  the  number  of  target 
pixels  in  the  target  mask. 

Principal  component  analysis  is  then  performed  on  the  two-dimensional  data  of 
C.  First,  the  column- wise  means  are  subtracted  from  the  data,  leaving  the  centered 
data  Cq .  Next,  the  normalized  covariance  matrix  £  of  Cq  is  determined.  Then  the 
eigenvectors  x  of  £  are  found  as  solutions  of 

(£  -  AI)x  =  0, 

where  I  is  the  identity  matrix  and  A  are  the  eigenvalues  associated  with  the  eigen¬ 
vectors  x.  Since  Co  is  two-dimensional,  the  eigenvector  associated  with  the  largest 
eigenvalue  is  the  major  axis  of  target  mask  pixels.  Intersecting  the  major  axis  with 
the  target  mask  centroid  yields  the  estimated  target  aspect  angle.  Figure  57  shows 
a  sequence  of  six  SAR  chips  with  the  PCA-estimated  and  true  target  aspect  angles. 

The  processing  steps  are  repeated  for  each  available  chip.  Figure  58  shows  the 
distribution  of  errors  from  the  aspect  angle  estimation  experiment.  A  mean  error  of 
11.34  is  within  the  assumed  ±  22.5°  accuracy  of  the  previous  discussion. 

Pose  estimation  in  the  context  of  ATR  is  not  new  [101,  102,  103,  104,  105].  Two 
approaches  exist  in  research  relative  to  pose  estimation.  The  first  employs  adaptive 
classifiers  that  must  be  trained  before  implemention.  An  example  of  this  type  of 
pose  estimator  is  a  neural  network  [101,  103].  The  other  uses  image  processing 
techniques  to  segment  the  target,  then  applies  various  criteria  to  estimate  the  target 
pose  [102,  104,  105], 
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error  =  8.78 


error  =  8.93 


error  =  10.00 


- estimated  angle 

true  angle 


error  =  8.87 


error  =  10.72 


error  =  15.63 


Figure  57.  Sequence  of  six  T-72  SAR  chips  from  the  MSTAR  publicly-available 
collection  with  estimated  and  true  angles  indicated. 


25 


Error  (degrees) 


Figure  58.  Distribution  of  errors  when  estimating  target  azimuth  using  principal 
component  analysis  on  a  target  mask.  Here  231  samples  of  the  T-72 
main  battle  tank  from  the  MSTAR  data  collection  (17  deg  depression 
angle,  target  identification  SN812)  are  used. 
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5-4-6  Target  class  prevalence 


The  DCS  experiment  uses  100  test  records  of  each  in-library  target  type.  Since 
there  are  5  hostile  target  types  and  5  friend/neutral  target  types,  the  ratio  of  hostile 
to  friend/neutral  is  1:1.  One  hundred  additional  records  are  used  to  test  the  clas¬ 
sifiers  against  out-of-library  target  types.  The  ratio  of  in- library  to  out-of- library 
target  records  is  10:1.  These  class  prior  probabilities  impact  classifier  performance 
by  simulating  operation  in  a  target-rich,  target-sparse,  or  target-friendly  equivalent 
environment. 

The  impact  on  system  performance  of  varying  target  class  prevalence  is  ex¬ 
plored  in  the  experiment  of  Sec.  5.6.2. 

5-4-7  Correlation  of  observations 

The  DCS  experiment  assumes  that  both  sensors  are  located  on  the  same  obser¬ 
vation  platform.  Indeed,  the  DCS  data  are  collected  from  the  same  sensor  using  two 
polarizations.  For  the  purposes  of  the  experiment,  the  data  are  presumed  to  have 
come  from  two  different  sensors  located  on  the  same  platform.  Thus,  the  starting 
aspect  angle  of  an  observation  sequence  from  sensor  1  results  in  the  same  starting 
aspect  angle  for  sensor  2.  The  observation  data  from  sensor  1  and  sensor  2  are  corre¬ 
lated  in  that  they  observe  the  target  from  a  shared  orientation  across  the  observation 
window. 

The  impact  of  altering  sensor  location  on  system  performance  is  explored  in 
the  experiment  of  Sec.  5.6.2. 

5.5  Extended  CID  optimization  framework 

This  section  presents  the  formulation  for  the  extended  CID  optimization  frame¬ 
work  used  in  the  DCS  experiment  and  defines  the  pertinent  performance  measures. 
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5. 5. 1  Formulation 


The  formulation  follows  closely  that  presented  in  Sec.  4.3.3,  where  x  is  a  vector 
of  decision  variables  defined  in  the  MP  formulation. 

Objective  Function: 

P  (x) 

max  TPR(:r)  =  — — -  maximizes  true-positive  rate  (69) 

xeX  num  looks 


Subject  to: 

Warfighter  constraints 


^crit 

< 

0.1 

upper  bound 

on  critical  errors 

^ncrit 

< 

0.2 

upper  bound 

on  non-critical  errors 

Pt  P 

> 

0.85 

lower  bound 

on  true-positive  performance 

^dec 

> 

0.5 

lower  bound  on  declaration  performance 

^ool 

> 

0.35 

lower  bound 

on  ont-of-library  performance 

Fusion  rule  constraint 
/ 

Fi  —  1  select  a  single  fusion  rule 

2=1 

{1  if  ith  fusion  rule  used 
0  otherwise 
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Sensor  selection  constraint 


S 


£sj 


S 


Y,si 


<  s 

>  1 


select  from  available  sensors 


select  at  least  one  sensor 


where  Sj  = 


1  if  jth  sensor  used 
0  otherwise 


Threshold  constraints 


O'3  >  0  lower  threshold  constraint 

O'3  <  1  upper  threshold  constraint 

where  613  is  the  decision  threshold  associ¬ 
ated  with  fusion  rule  i  and  sensor  j.  The 
decision  threshold  may  be  6*roc  or  $rej. 

5.5.2  Performance  Measures 

5. 5. 2. 1  True-positive 

The  estimate  for  the  true-positive  performance  measure  is  the  number  of  true 
hostile  records  labeled  “hostile”  divided  by  the  total  number  of  true  hostile  records 
declared.  This  definition  accounts  for  the  added  rejection  option  and  the  resultant 
“non-declare”  label.  The  calculation  is 


P(“TOD”  U  “OH”  |  (TOD  U  OH)  D  declaration) 
num(“TOD”  |TOD  +  “TOD”  |OH  +  “OH”  |TOD  +  “OH”  |OH) 
num(TOD  declared  +  OH  declared) 


(70) 
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5. 5. 2. 2  Critical  error 


Critical  error  is 


P(EC  rit)  =  P 


P(“TOD”  n  FN)  U  P(“OH”  n  FN)  U 
P(“FN”  n  TOD)  U  P(“FN”  D  OH) 


declaration  I 


(71) 


Simplification  of  Eq.  71  makes  use  of  Bayes’  rule,  and  depends  on  class  preva¬ 
lence  as  defined  by  class  prior  probabilities.  Let  P(TOD),  P(OH),  P(FN),  and 
P(OOL)  be  the  prior  probabilities  of  the  four  true  target  classes,  and  P(“TOD”), 
P(“OH”),  P(“FN”),  P(“OOL”),  P(  “Non-declare” )  be  the  unconditional  system  label 
probabilities,  then 


P(TOD)  +  P(OH)  +  P(FN)  +  P(OOL)  =  1  (72) 
P(“TOD”)  +  P(“OH”)  +  P(“FN”)  +  P(“OOL”)  +  P( “Non-declare”)  =  1.  (73) 


The  simplihed  P(PCrit)  calculation  is 


P(P(:nt) 


(  P(  “TOD”  |FN)P(FN)  +  P(  “OH”  |FN)P(FN)  + 
l  P(“FN”|TOD)P(TOD)  +  P(“FN”|OH)P(OH) 

1  —  P(  “Non-declare”) 


(74) 


where  P( “Non-declare”)  is  determined  by  the  sum  of  the  class-specific  probability  of 
non-declaration 


P(  “Non-declare”  |TOD)P(TOD) 


+  P(  “Non-declare”  |OH)P(OH) 

+  P  ( “Non-declare”  |  FN)  P  (FN) 

+  P(“Non-declare”|OOL)P(OOL). 


(75) 
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5. 5. 2. 3  Non- critical  error 


As  with  critical  error,  non-critical  error  calculation  reverses  the  order  of  condi¬ 
tioning  seen  in  the  true-positive  calculation.  Non-critical  error  calculation  involves 
vertical  analysis  of  the  confusion  matrix. 

Some  flexibility  exists  in  choosing  which  classification  errors  constitute  non- 
critical  errors.  For  this  experiment,  non-critical  errors  consider  cross-labeled  hostile 
targets  (i.e.,  TOD  labeled  “OH”  and  OH  labeled  “TOD”)  and  mis-labeling  true 
classes  as  out-of-library: 


/ 

7 

P(“TOD”|OH)  UP(“OH”  TOD)  U 

\ 

-P^ncrit)  =  P 

P(“OOL”|TOD)  U  P(“OOL”  |OH)  U 

declaration 

V 

A 

P(“OOL”|FN) 

) 

/ 

This  result  simplifies  to  the  following  non-critical  error  calculation  incorporating 
prior  class  probabilities 


P(E ncrit) 


P(“TOD”|OH)P(OH)  +  P(“OH”|TOD)P(TOD)  +  \ 
P(“OOL”|TOD)P(TOD)  +  P(“OOL”  |OH)P(OH)  + 
y  P(“OOL”|FN)P(FN)  ) 

1  —  P(“Non-declare”) 


(77) 


where  P(“Non-declare”)  is  determined  by  the  sum  of  the  class-specific  probability  of 
non-declaration. 


5. 5. 2. 4  Declaration 

The  declaration  performance  measure  captures  the  percentage  of  test  records 
which  the  system  labels  with  one  of  the  true  class  labels.  The  complementary  mea¬ 
sure  is  the  non-declaration  performance  measure.  It  tabulates  the  number  of  records 
labeled  “Non-declare”  by  the  system: 
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P^ec  =  1  —  P(  “Non-declare”) 

/  P(  “Non-declare”  |TOD)P(TOD)  +  P(  “Non-declare”  |OH)P(OH)\ 

=  1-  +P(  “Non-declare”  |FN)P(FN)  .  (78) 

\  +P(  “Non-declare”  |OOL)P(OOL)/ 

5. 5. 2. 5  Out- of -library 

The  out-of-library  performance  measure  is  a  true-positive  labeling  of  “OOL” 
given  an  OOL  record  using  horizontal  analysis  of  confusion  matrix  entries.  The 
estimate  for  the  out-of-library  performance  measure  is  the  number  of  true  OOL 
records  labeled  “OOL”  divided  by  the  total  number  of  true  OOL  records  declared: 

Pool  =  P(  “OOL”  |  OOL) 

num(  “OOL”  |  OOL) 
num(OOL  declared) 

Vertical  analysis  of  out-of-library  performance  (mis-labeling  of  true  classes  as 
out-of-library)  is  included  in  the  non-critical  error  performance  measure,  but  may 
also  be  defined  as  a  second  type  of  non-critical  error  per  warfighter  preference. 

5. 6  Results 

Results  are  in  two  sections.  The  first  provides  initial  results  for  a  specific  set 
of  experimental  parameters.  The  second  explores  results  of  a  designed  experiment 
where  the  experimental  parameters  are  varied. 


(79) 
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5. 6. 1  Initial  results 


The  experiment  settings  for  the  initial  results  include  the  design  of  the  HMM 
and  template  classifier  described  in  Sec.  5.3,  the  data  processing  and  methodology 
of  Sec.  5.2  and  Sec.  5.4,  and  the  CID  framework  of  Sec.  5.5. 

The  classifier  design,  data  preparation,  and  experimental  methodology,  place 
the  two  competing  classifiers  on  equal  footing.  Both  classifiers  use  the  same  10- 
dimensional  feature  data  extracted  and  interpolated  from  HRR  profiles  of  sequenced 
SAR  target  images  to  train  on  the  10-class  problem.  Test  sequences  for  the  two 
classifier  types  are  drawn  from  the  same  SAR  data  set  collected  at  a  depression 
angle  of  10  degrees  (versus  6  and  8  degrees  for  the  training  set). 

Test  sequences  contain  the  same  number  of  observations  and  are  considered 
taken  from  co- located  sensors.  There  are  an  equal  number  of  hostile  target  test 
records  and  friend/neutral  test  records.  The  ratio  of  hostile  to  friend/neutral  to 
out-of- library  test  records  is  5:5:1. 

Post-processing  classifier  output  is  handled  in  an  equivalent  manner.  Fusion 
rules  are  applied  the  same  way,  and  the  out-of-library  discriminator  functions  are  ap¬ 
plied  the  same  way  for  the  HMM-based  classifier  as  for  the  template-based  classifier. 
Labeling  the  test  records  as  a  function  of  the  ROC  and  rejection  region  thresholds 
is  performed  in  the  same  way  for  both  classifiers. 

Both  classifiers  are  evaluated  within  the  same  CID  optimization  framework. 
The  objective  function  is  the  same;  it  maximizes  true-positive  performance  as  a  func¬ 
tion  of  number  of  sensor  observations.  The  warfighter  constraints  are  held  constant 
for  both  classifiers.  The  minimum  true-positive  performance  is  0.85,  he  maximum 
critical  error  rate  is  0.1,  the  maximum  non-critical  error  rate  is  0.2,  the  minimum 
declaration  rate  is  0.5,  and  the  minimum  out-of- library  performance  rate  is  0.35. 
The  formulas  for  determining  these  performance  measures,  as  shown  in  Sec.  5.5.2, 
are  applied  in  the  same  manner  to  both  types  of  classifiers. 
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Table  13.  HMM-  and  template-based  system  performance  comparison 


Classifier 

Fusion 

Percent  feasible 

Mean  feasible  value 

Opt  val 

tp 

crit 

n-crit 

dec 

ool 

joint 

tp 

crit 

n-crit 

dec 

ool 

joint 

0.85 

0.1 

0.2 

0.5 

0.35 

0.85 

HMM 

Sensor  1 

0.50 

0.99 

0.50 

0.75 

0.72 

0.23 

0.96 

0.01 

0.09 

0.84 

0.58 

0.91 

0.9723 

Sensor  2 

0.50 

0.99 

0.32 

0.75 

0.75 

0.07 

0.94 

0.04 

0.04 

0.84 

0.65 

0.91 

0.9556 

Mean 

0.50 

0.99 

0.50 

0.75 

0.75 

0.25 

0.96 

0.02 

0.09 

0.84 

0.64 

0.91 

0.9625 

ANN 

0.42 

0.79 

0.37 

0.73 

0.70 

0.06 

0.99 

0.03 

0.05 

0.78 

0.62 

0.98 

1.0000 

Label 

0.62 

1.00 

0.85 

0.53 

0.50 

0.12 

0.98 

0.02 

0.07 

0.66 

0.51 

0.92 

1.0000 

Template 

Sensor  1 

0.36 

0.92 

0.40 

0.75 

0.60 

- 

0.98 

0.05 

0.07 

0.83 

0.48 

- 

- 

Sensor  2 

0.36 

0.87 

0.40 

0.75 

0.65 

0.01 

0.98 

0.05 

0.06 

0.82 

0.53 

0.87 

0.8893 

Mean 

0.38 

0.95 

0.40 

0.75 

0.68 

0.06 

0.98 

0.03 

0.06 

0.83 

0.56 

0.91 

0.9557 

ANN 

0.37 

0.51 

0.35 

0.68 

0.67 

- 

0.98 

0.06 

0.06 

0.81 

0.58 

- 

- 

Label 

0.40 

0.98 

0.69 

0.43 

0.20 

- 

0.99 

0.03 

0.06 

0.67 

0.38 

- 

- 

The  threshold  space  over  which  system  performance  is  examined  is  the  same 
for  both  types  of  classifiers.  The  ROC  threshold  6%oc  varies  from  0  to  1  in  0.05 
increments,  leading  to  21  settings.  The  rejection  region  half- width  threshold  $rej 
varies  from  0  to  0.45  in  0.05  increments,  leading  to  10  settings. 

Results  for  the  initial  system  comparison  are  shown  in  Table  13.  Performance 
results  are  shown  for  both  types  of  classifiers,  HMM-based  on  top  and  template- 
based  on  bottom.  Given  a  type  of  classifier,  the  results  are  broken  down  by  fusion 
methodology:  first,  no  fusion  methodology  is  chosen  and  sensor  1  and  2  operate 
as  independent  classifiers,  second,  a  simple  mean  fusion  rule,  third,  neural  network 
fusion,  and  finally  boolean  fusion. 

Performance  is  measured  in  two  ways.  First,  a  measure  of  classifier  robustness 
is  used.  Percent  feasible  refers  to  the  percentage  of  settings  in  the  threshold  space 
that  result  in  feasible  performance  given  a  certain  warfighter  constraint.  For  example, 
referring  to  Table  13,  of  the  21  x  10  =  210  threshold  settings  for  (#roc  •  0rej),  50% 
result  in  feasible  performance  for  the  true-positive  warfighter  constraint  (P^p  >  0.85) 
in  the  case  of  Sensor  1  acting  alone  with  an  HMM-based  classifier. 
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This  measure  of  robustness  is  applied  to  each  of  the  five  warfighter  constraints 
(true-positive,  critical  error,  non-critical  error,  declaration,  and  out-of-library)  in  ad¬ 
dition  to  the  jointly  feasible  measure  of  robustness.  In  the  joint  case,  the  robustness 
measure  captures  the  percentage  of  the  threshold  space  that  produces  feasible  points 
across  all  five  constraints  simultaneously. 

The  second  measure  of  performance  is  the  mean  feasible  value.  The  mean 
feasible  value  is  the  average  value  among  feasible  points  for  a  given  performance 
measure.  Again,  referring  to  Table  13,  the  mean  feasible  true-positive  performance 
for  Sensor  1  acting  alone  with  an  HMM-based  classifier  is  0.96.  The  boldface  values 
located  directly  underneath  the  performance  measure  labels  are  the  right-hand  side 
of  the  warfighter  performance  constraints.  The  mean  jointly- feasible  performance 
value  is  mean  true-positive  value  of  the  jointly-feasible  points.  The  optimal  value 
is  the  maximum  true-positive  value  of  the  jointly-feasible  points.  If  there  are  no 
feasible  points  a  is  placed  in  the  table  at  that  location. 

Figures  59  and  60  provide  an  additional  method  to  analyze  system  perfor¬ 
mance.  Figure  59  shows  system  performance  for  an  HMM-based  classifier  system 
employing  an  artificial  neural  network  (ANN)  fusion  rule.  Figure  60  shows  system 
performance  for  a  template-based  classifier  system  using  a  similarly-trained  ANN 
fusion  rule. 

Each  figure  contains  six  subplots  which  detail  system  performance  for  the  five 
warfighter  performance  measures  plus  a  sixth  subplot  which  shows  joint  performance 
across  the  five  measures.  The  xy  plane  in  each  subplot  indicate  the  ROC  and  reject 
threshold  settings.  The  surface  above  represents  system  performance  for  a  given 
warfighter  performance  measure,  such  as  true-positive  rate.  Dots  are  used  to  indi¬ 
cate  the  threshold  coordinate  pairs  where  performance  met  the  warfighter  constraint 
(feasible  points). 

The  sixth  subplot  shows  the  threshold  coordinate  pairs  that  are  jointly-feasible 
across  the  five  measures  and  plots  them  on  the  true-positive  surface.  The  maximum 
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out-of-library  true-pos 


Figure  59.  Performance  surfaces  determined  by  ROC  threshold  and  reject  thresh¬ 
old  settings  with  feasible  points  shown  for  true-positive  rate  (top  left), 
critical  error  rate  (top  middle),  non-critical  error  rate  (top  right),  out- 
of- library  rate  (bottom  left),  and  declaration  rate  (bottom  middle). 
Bottom  right  uses  the  true-positive  surface  to  show  jointly- feasible 
points  with  optimal  point  indicated.  The  system  uses  an  HMM-based 
classifier,  co-located  sensors,  and  a  neural  network  fusion  method. 
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Figure  60.  Performance  surfaces  for  template-based  classifier  at  same  settings  as 
HMM-based  classifier  of  Fig.  59.  Note  the  lack  of  jointly- feasible  solu¬ 
tions  in  the  bottom-right  subplot. 

value  of  the  jointly- feasible  points  is  also  indicated.  This  value  corresponds  to  the 
optimal  value  in  Table  13. 


5. 6.1.1  Interpretation 

Table  13  provides  a  concise  collection  of  performance  measures  used  to  compare 
both  classifier  types  and  the  methods  used  to  fuse  classifier  outputs.  To  compare  the 
classifier  types,  a  mean  value  across  fusion  method  is  shown  for  both  the  robustness 
measures  and  the  feasible  values  in  Table  14.  Clearly,  the  HMM-based  classifier 
is  more  robust  in  the  threshold  space  than  the  template-based  classifier.  Indeed, 
15%  of  the  threshold  space  yields  jointly-feasible  operating  points  for  the  HMM- 
based  classifier,  while  this  value  is  only  1%  for  the  template-based  classifier.  When 
comparing  the  systems  based  on  mean  feasible  performance  measure  values,  the  lack 
of  jointly  feasible  operating  points  brings  the  template-based  mean  value  down  to 
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Table  14.  Comparison  of  mean  performance  for  HMM-  and  template-based  sys¬ 
tems 

Classifier  Percent  feasible  Mean  feasible  value  Opt  val 


tp 

crit 

n-crit 

dec 

ool 

joint 

tp 

crit 

n-crit 

dec 

ool 

joint 

0.85 

0.1 

0.2 

0.5 

0.35 

0.85 

HMM 

0.51 

0.95 

0.51 

0.70 

0.68 

0.15 

0.96 

0.02 

0.07 

0.79 

0.60 

0.93 

0.9781 

Template 

0.37 

0.85 

0.45 

0.67 

0.56 

0.01 

0.98 

0.04 

0.06 

0.79 

0.50 

0.36 

0.3690 

0.36,  while  the  HMM  classifier  performs  at  0.93.  The  mean  optimal  value  also  favors 
the  HMM-based  classifier  at  0.9781  versus  0.3690. 

When  comparing  fusion  methods  within  a  classifier  type,  some  trends  are  ob¬ 
served.  Referring  to  Table  13,  the  boolean  fusion  method  provides  better  than 
average  robustness  for  true-positive,  critical  error,  and  non-critical  error,  but  its  ro¬ 
bustness  lags  for  declarations  and  out-of-library  feasibility.  The  mean  performance 
values  for  the  boolean  case  follow  a  similar  trend,  strong  in  true-positive,  critical  er¬ 
ror,  and  non-critical  error,  but  weak  in  declarations  and  out-of-library  performance. 

The  performance  surface  plots  shown  in  Figs.  59  and  60  capture  results  for  the 
artificial  neural  network  (ANN)  fusion  method  for  the  HMM-based  and  template- 
based  cases  respectively.  The  plots  reveal  an  HMM  advantage  in  robustness  in  the 
critical  error  performance  measure  (middle  subplot,  top  row).  The  HMM  surface  is 
lower  (less  critical  error),  resulting  in  more  feasible  points  (79%  versus  51%  for  the 
template-based  classifier) .  The  reduced  feasible  critical  error  space  for  the  template- 
based  classifier  is  the  limiting  factor  in  determining  the  lack  of  jointly-feasible  oper¬ 
ating  points. 

Two  more  plots  similar  to  Figs.  59  and  60  are  given  to  show  performance 
surfaces  for  the  case  of  the  simple  mean  fusion  rule.  Figure  61  shows  surface  plots 
for  the  HMM-based  case  and  the  template-based  case. 

Comparing  the  plots  of  Fig.  61  with  those  of  Figs.  59  and  60  yields  several 
observations.  First,  a  comparison  of  the  surfaces  for  the  HMM-based  neural  fusion 
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Figure  61.  Performance  surfaces  determined  by  ROC  threshold  and  reject  thresh¬ 
old  settings.  HMM-based  classifier  surfaces  are  shown  above  the  line, 
and  template-based  surfaces  are  shown  below  the  line.  Both  experi¬ 
ments  use  co-located  sensors  and  a  simple  mean  fusion  method. 
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and  mean  fusion  shows  more  feasible  space  in  the  true-positive,  critical  error,  non- 
critical  error,  and  jointly-feasible  surfaces  for  the  mean  fusion  method.  However, 
the  neural  fusion  method  has  a  perfect  1.0  optimal  solution  versus  the  mean  fusion 
value  of  0.9625  due  to  a  more  robust  feasible  space  in  the  out-of-library  performance 
surface.  Further,  the  mean  fusion  method  produces  performance  surfaces  exhibiting 
sharp  steps  rather  than  the  smooth  contours  of  the  neural  network  fusion.  This 
result  stems  from  the  neural  network  mapping  from  classifier  outputs  to  a  [0,1] 
posterior  space,  producing  a  more  evenly  distributed  posterior  vector  compared  to 
the  exponentiated  log-likelihood  of  the  mean  fusion  rule,  which  produces  posteriors 
grouped  tightly  near  either  0  or  1. 

The  template-based  mean  fusion  performance  surfaces  shown  below  the  line  in 
Fig.  61  reveal  an  improved  feasible  space  for  the  critical  error  measure.  The  more 
robust  critical  error  feasible  space  allows  several  jointly-feasible  solutions  with  an 
optimal  value  of  0.9557. 

General  trends  evident  no  matter  the  classifier  type  or  fusion  method  include 
trade-offs  between  the  performance  measures  as  a  function  of  location  within  the 
threshold  space.  The  best  true-positive  performance  occurs  in  the  northwest  corner 
of  the  threshold  space  (looking  down  at  the  xy  plane  with  (0,0)  at  southwest  corner). 
This  location  corresponds  to  a  low  ROC  threshold  (aggressive  hostile  declaration) 
and  a  high  rejection  region  threshold  (large  rejection  region  -  only  highly  confident 
records  are  labeled). 

The  out-of-library  performance  surface  rises  where  the  true-positive  surface 
falls,  in  the  northeast  corner  of  the  threshold  space.  The  best  out-of-library  perfor¬ 
mance  occurs  when  the  ROC  threshold  is  high  (conservative  hostile  declaration)  and 
the  rejection  region  is  large.  At  this  point  very  few  hostile  records  are  declared  and 
the  true-positive  surface  is  at  0  in  each  of  the  plots. 

Critical  error  peaks  are  at  the  southwest  and  southeast  corners,  where  the 
rejection  region  nears  0  (few  non-declarations)  and  the  ROC  threshold  is  near  0 
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(aggressive  hostile  declaration)  or  1  (conservative  hostile  declaration).  The  saddle 
shape  of  the  performance  surface  reveals  whether  the  classifier- fusion  pairing  is  robust 
in  the  critical  error  sense.  If  the  saddle  is  low  and  flat  (HMM-mean  fusion),  then  the 
critical  error  performance  is  good.  If  the  saddle  is  high  with  large  sides  (template- 
neural  fusion),  then  the  performance  is  poor. 

Declaration  performance  is  a  function  of  the  rejection  region;  the  larger  the 
rejection  region,  the  greater  the  number  of  non-declarations,  and  hence  the  lower 
the  declaration  rate.  The  largest  rejection  region  occurs  when  the  ROC  threshold  is 
at  0.5  and  the  rejection  region  half- width  threshold  is  at  0.45.  This  result  yields  a 
rejection  region  width  of  0.9  centered  at  0.5.  Most  plots  reach  a  minimum  declaration 
performance  at  this  location  in  the  threshold  space.  The  HMM-mean  fusion  plot  of 
Fig.  61,  however,  shows  a  relatively  large  declaration  rate  (approximately  0.8)  at 
(#roc,  #rej)  =  (0.5,  0.45).  This  result  is  explained  by  the  tight  grouping  of  posteriors 
at  0  and  1  (outside  the  rejection  window)  resulting  from  the  exponentiation  of  large 
negative  log-likelihoods  from  the  HMM  classifiers. 

Non-critical  error  incorporates  cross-labeled  hostile  types  (“TOD”  for  OH, 
“OH”  for  TOD)  as  well  as  in-library  targets  mis-labeled  as  out-of-library  records. 
Thus,  the  non-critical  error  surface  is  influenced  by  the  true-positive  and  out-of- 
library  performance  measures.  In  the  northwest  corner  of  the  threshold  space,  true¬ 
positive  performance  is  excellent  and  out-of- library  performance  is  poor.  Many  hos¬ 
tile  records  are  labeled  correctly  and  few  records  are  labeled  as  out-of-library.  It  is 
not  surprising  that  the  non-critical  error  surface  is  at  or  near  0  for  this  corner  of  the 
threshold  space.  As  true-positive  performance  falls  and  more  out-of-library  labels 
are  made,  the  non-critical  error  surface  climbs  rapidly. 
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Table  15.  Designed  experimental  settings 


Design  parameter 

Settings 

Purpose 

Classifier  type 

HMM-based 

template-based 

competing  classifiers 

Fusion  method 

Sensor  1 

Sensor  2 

simple  mean  fusion 
ANN  fusion 
label  fusion 

explore  fusion  methodologies 

Sensor  location 

Co-located  sensors 

explore  correlation  of 

Independent  sensors 

observations 

Observation  length 

short 

explore  effects  of  fewer/more 

medium 

long 

observations 

Target  class  prior 

10:1  4:1  2:1 

explore  target  rich  versus  target 

probabilities  (H:F) 

1:1  1:2  1:4 

1:10 

sparse  operating  environments 

Prior  target  aspect 

±22.5° 

explore  effects  of  more 

knowledge 

±37.5° 

accurate  initial  target  aspect 

none 

knowledge 

5.6.2  Designed  experiment  results 

The  results  shown  and  discussed  in  Sec.  5.6.1  correspond  to  specific  experiment 
settings.  These  settings  include  the  prior  probabilities  of  the  target  classes,  the 
location  of  the  sensors,  the  number  of  observations  in  the  test  sequences,  and  the 
level  of  prior  knowledge  of  target  aspect  angle.  For  the  initial  experiment  the  prior 
probabilities  of  the  target  classes  are  held  at  1:1  for  the  hostile  to  friend/neutral 
classes.  Sensors  1  and  2  are  co-located,  meaning  they  observe  the  target  from  the 
same  orientation.  The  initial  experiment  assumes  prior  knowledge  of  the  target 
aspect  angle  to  within  ±22.5°. 

This  section  describes  a  designed  experiment  by  expanding  the  settings  used 
in  the  initial  experiment.  Table  15  shows  the  designed  experimental  settings. 
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Figure  62.  Co- located  sensors  on  a  single  platform  sweep  out  an  observation  win¬ 

dow  6  beginning  at  a  and  ending  at  [3  —  a  +  9.  Independent  sensors 
are  located  on  separate  platforms  and  sweep  out  observation  windows 
Oi  =  beginning  at  different  starting  angles  a.\  ^  «2. 


As  in  the  initial  experiment,  CID  systems  based  on  two  types  of  classifier  are 
in  competition.  The  multiple  classifier  systems  are  fused  using  the  same  set  of  fusion 
methods  as  the  initial  experiment. 

The  designed  experiment  includes  an  additional  sensor  location  setting.  Orig¬ 
inally,  the  sensors  are  co-located.  The  designed  experiment  allows  the  sensors  to 
be  located  on  different  platforms,  which  means  that  the  sensor-to-target  orienta¬ 
tion  is  no  longer  directly  correlated.  Rather,  the  observation  sequences  may  begin 
a  different  target  aspect  angles  for  each  sensor,  which  corresponds  to  two  sensors 
simultaneously  observing  the  same  target  from  two  different  orientations  as  shown 
in  Fig.  62. 

The  medium  observation  length  is  that  used  in  the  initial  experiment.  The 
designed  experiment  uses  a  reduced  observation  sequence  (short)  and  an  extended 
observation  sequence  (long)  to  explore  the  effect  of  more  observations  on  classification 
performance.  Intuitively,  more  observations  means  more  data  on  which  the  classifiers 
can  act,  which  should  reduce  error.  However,  the  feature  data  is  a  noisy  function 
of  aspect  angle,  and  a  longer  observation  sequence  may  introduce  more  target  class 
confusion  compared  to  a  short  observation  sequence. 


157 


The  warfighter  is  interested  in  CID  system  performance  across  a  variety  of 
operational  conditions.  One  of  these  conditions  is  the  prevalence  of  hostile  targets 
relative  to  friendly  or  neutral  objects.  The  initial  experiment  held  the  prior  proba¬ 
bility  of  the  hostile  classes  equal  to  the  prior  probability  of  the  friend/neutral  classes 
(1:1).  The  designed  experiment  explores  various  H:F  prior  probability  settings  and 
their  effects  on  performance.  Table  15  lists  these  settings;  they  vary  from  a  target 
dense  environment  (10:1)  to  a  target  sparse  environment  (1:10). 

The  last  designed  experimental  parameter  is  the  level  of  knowledge  of  the  target 
initial  aspect  angle.  Knowing  a  priori  the  target’s  pose  relative  to  the  sensor  means 
searching  a  smaller  template  space  for  a  match,  thus  reducing  error.  The  initial 
setting  was  knowledge  within  ±22.5°  of  the  targets  true  initial  aspect  angle.  This 
assumption  was  verified  by  the  pose  estimation  calculation  of  Sec.  5.4.5.  The  initial 
setting  is  supplemented  with  a  degraded  setting  (±37.5°)  and  a  setting  where  no 
prior  knowledge  of  the  target’s  initial  aspect  is  used. 

5. 6. 2.1  Prior  knowledge  of  target  aspect 

This  section  discusses  the  impact  of  prior  knowledge  of  target  aspect  based 
on  results  for  the  HMM-based  and  template-based  classifier  systems.  Table  16  and 
Table  17  capture  performance  measures  for  an  HMM-based  system  and  a  template- 
based  system,  respectively.  The  results  are  obtained  from  an  experiment  with  co¬ 
located  sensors,  a  medium  observation  length,  and  equal  prior  probabilities  for  hostile 
and  friend/neutral  classes. 

Within  the  HMM-based  results  (Table  16)  one  sees  a  slight  trend  down  in 
measures  of  robustness  as  the  prior  knowledge  of  target  aspect  worsens.  The  HMM- 
based  system  achieves  jointly- feasible  solutions  in  every  case  but  for  Sensor  2  acting 
independently  in  the  ±37.5°  case.  For  the  template-based  system  (Table  17),  jointly- 
feasible  solutions  exist  in  only  9  of  15  cases,  and  of  these  the  number  of  jointly-feasible 
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Table  16.  Performance  comparison  of  prior  aspect  knowledge  for  HMM-based  clas¬ 
sifier,  co-located  sensors,  medium  observation  length,  and  equal  priors 


Aspect 

Fusion 

Percent  feasible 

Mean  feasible  value 

Opt  val 

tp 

crit 

n-crit 

dec 

ool 

joint 

tp 

crit 

n-crit 

dec 

ool 

joint 

0.85 

0.1 

0.2 

0.5 

0.35 

0.85 

±22.5° 

Sensor  1 

0.50 

0.99 

0.50 

0.75 

0.72 

0.23 

0.96 

0.01 

0.09 

0.84 

0.58 

0.91 

0.9723 

Sensor  2 

0.50 

0.99 

0.32 

0.75 

0.75 

0.07 

0.94 

0.04 

0.04 

0.84 

0.65 

0.91 

0.9556 

Mean 

0.50 

0.99 

0.50 

0.75 

0.75 

0.25 

0.96 

0.02 

0.09 

0.84 

0.64 

0.91 

0.9625 

ANN 

0.42 

0.79 

0.37 

0.73 

0.70 

0.06 

0.99 

0.03 

0.05 

0.78 

0.62 

0.98 

1.0000 

Label 

0.62 

1.00 

0.85 

0.53 

0.50 

0.12 

0.98 

0.02 

0.07 

0.66 

0.51 

0.92 

1.0000 

±37.5° 

Sensor  1 

0.50 

0.99 

0.55 

0.75 

0.69 

0.20 

0.96 

0.01 

0.08 

0.84 

0.53 

0.92 

0.9855 

Sensor  2 

0.32 

0.99 

0.55 

0.75 

0.68 

- 

0.98 

0.04 

0.06 

0.85 

0.57 

- 

- 

Mean 

0.50 

0.99 

0.55 

0.75 

0.68 

0.19 

0.96 

0.02 

0.06 

0.84 

0.57 

0.90 

0.9014 

ANN 

0.42 

0.81 

0.42 

0.74 

0.65 

0.05 

0.98 

0.04 

0.06 

0.79 

0.57 

0.90 

0.9567 

Label 

0.56 

1.00 

0.80 

0.53 

0.46 

0.09 

0.97 

0.01 

0.05 

0.66 

0.47 

0.89 

0.9186 

none 

Sensor  1 

0.50 

0.99 

0.50 

0.75 

0.75 

0.25 

0.96 

0.04 

0.10 

0.84 

0.56 

0.92 

0.9533 

Sensor  2 

0.32 

0.92 

0.50 

0.75 

0.70 

0.02 

0.97 

0.06 

0.09 

0.84 

0.63 

0.87 

0.8773 

Mean 

0.50 

0.99 

0.32 

0.75 

0.75 

0.07 

0.96 

0.04 

0.06 

0.83 

0.61 

0.93 

0.9499 

ANN 

0.38 

0.84 

0.44 

0.74 

0.65 

0.03 

0.98 

0.06 

0.07 

0.81 

0.60 

0.88 

0.9057 

Label 

0.62 

1.00 

0.85 

0.39 

0.45 

0.10 

0.98 

0.02 

0.07 

0.66 

0.47 

0.91 

0.9357 

points  is  quite  small  (fewer  than  5  points)  versus  the  larger  number  for  the  HMM 
case  (20  -  30  points). 

Figure  63  singles  out  the  case  where  Sensor  1  acts  independently  with  no  prior 
target  aspect  information.  From  Tables  16  and  17,  one  sees  the  share  an  optimal 
solution  value  of  approximately  0.95,  but  the  HMM-based  classifier  outperforms  the 
template-based  classifier  when  considering  the  size  of  the  feasible  regions.  Focusing 
on  the  surface  plots  of  Fig.  63,  one  sees  that  the  two-tiered  true-positive  surface  of 
the  HMM-based  classifier  affords  a  larger  feasible  region  (and  larger  jointly-feasible 
region)  than  the  template-based  classifier.  The  critical  error  surface  in  the  HMM 
case  is  markedly  better  than  the  template  case.  Combined,  the  differences  between 
the  HMM  and  template  case  for  the  true-positive,  critical  error,  and  non-critical 
error  feasible  regions  restrict  the  template-based  jointly-feasible  space  to  a  single 
point  while  giving  the  HMM-based  classifier  a  jointly-feasible  region  that  is  fully 
one-quarter  of  the  threshold  space. 
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Figure  63.  Performance  surfaces  determined  by  ROC  threshold  and  reject  thresh¬ 
old  settings.  HMM-based  classifier  surfaces  are  shown  above  the  line, 
and  template-based  surfaces  are  shown  below  the  line.  Sensor  1  perfor¬ 
mance  is  with  equal  priors,  medium  observation  length,  and  no  prior 
knowledge  of  target  aspect. 
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Table  17.  Performance  comparison  of  prior  aspect  knowledge  for  template-based 
classifier,  co- located  sensors,  medium  observation  length,  and  equal  pri¬ 
ors 


Aspect 

Fusion 

Percent  feasible 

Mean  feasible  value 

Opt  val 

tp 

crit 

n-crit 

dec 

ool 

joint 

tp 

crit 

n-crit 

dec 

ool 

joint 

0.85 

0.1 

0.2 

0.5 

0.35 

0.85 

±22.5° 

Sensor  1 

0.36 

0.92 

0.40 

0.75 

0.60 

_ 

0.98 

0.05 

0.07 

0.83 

0.48 

- 

_ 

Sensor  2 

0.36 

0.87 

0.40 

0.75 

0.65 

0.01 

0.98 

0.05 

0.06 

0.82 

0.53 

0.87 

0.8893 

Mean 

0.38 

0.95 

0.40 

0.75 

0.68 

0.06 

0.98 

0.03 

0.06 

0.83 

0.56 

0.91 

0.9557 

ANN 

0.37 

0.51 

0.35 

0.68 

0.67 

- 

0.98 

0.06 

0.06 

0.81 

0.58 

- 

- 

Label 

0.40 

0.98 

0.69 

0.43 

0.20 

- 

0.99 

0.03 

0.06 

0.67 

0.38 

- 

- 

±37.5° 

Sensor  1 

0.37 

0.73 

0.34 

0.75 

0.69 

0.02 

0.98 

0.06 

0.08 

0.81 

0.51 

0.94 

0.9532 

Sensor  2 

0.35 

0.83 

0.40 

0.73 

0.66 

0.02 

0.98 

0.06 

0.06 

0.83 

0.56 

0.87 

0.8832 

Mean 

0.38 

0.80 

0.37 

0.75 

0.68 

0.04 

0.98 

0.04 

0.07 

0.81 

0.55 

0.93 

0.9594 

ANN 

0.38 

0.59 

0.34 

0.70 

0.69 

0.00 

0.99 

0.06 

0.08 

0.79 

0.67 

0.97 

0.9746 

Label 

0.40 

0.84 

0.99 

0.39 

0.02 

- 

0.98 

0.03 

0.09 

0.64 

0.36 

- 

- 

none 

Sensor  1 

0.36 

0.54 

0.32 

0.71 

0.70 

0.00 

0.98 

0.05 

0.07 

0.80 

0.65 

0.96 

0.9564 

Sensor  2 

0.33 

0.49 

0.35 

0.69 

0.66 

- 

0.99 

0.06 

0.07 

0.81 

0.60 

- 

- 

Mean 

0.36 

0.59 

0.34 

0.70 

0.69 

0.00 

0.98 

0.05 

0.06 

0.81 

0.62 

0.95 

0.9450 

ANN 

0.35 

0.74 

0.39 

0.71 

0.62 

- 

0.98 

0.07 

0.08 

0.80 

0.53 

- 

- 

Label 

0.38 

0.77 

0.73 

0.31 

0.43 

0.00 

0.99 

0.03 

0.07 

0.64 

0.48 

0.88 

0.9363 

5. 6. 2. 2  Target  class  prior  probabilities 


The  designed  experiment  explores  the  effects  of  target  class  prior  probabilities 
across  a  range  of  settings  from  target  dense  (10:1)  to  target  sparse  (1:10).  Table  18 
shows  comparative  results  for  an  HMM-based  system  and  a  template-based  system 
using  a  single  sensor  (Sensor  1)  at  the  long  observation  length  setting  with  prior 
knowledge  of  the  target  aspect  to  within  ±22.5°.  Results  span  the  settings  for  target 
class  prior  probabilities. 

One  notices  that  the  template-based  system  is  relatively  robust  to  changes 
in  the  prior  probabilities  of  the  target  classes.  The  HMM-based  classifier  performs 
well  in  the  target  rich  environment,  but  there  is  a  break  point  moving  from  equal 
priors  (1:1)  to  target  sparse  (1:2)  and  beyond,  where  the  non-critical  feasibility  space 
shrinks  by  half  and  eliminates  all  jointly-feasible  solutions. 
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Table  18.  Performance  comparisons  across  target  class  prior  probability  settings: 

Sensor  1,  long  observation  length,  and  ±22.5°  target  aspect  knowledge 


HMM 


Template 


Priors  Percent  feasible  Mean  feasible  value  Opt  val 


(H:F) 

tp 

crit 

n-crit 

dec 

ool 

joint 

tp 

crit 

n-crit 

dec 

ool 

joint 

0.85 

0.1 

0.2 

0.5 

0.35 

0.85 

10:1 

0.50 

1.00 

0.50 

0.74 

0.75 

0.25 

0.97 

0.01 

0.04 

0.83 

0.66 

0.94 

0.9884 

4:1 

0.50 

0.99 

0.50 

0.74 

0.75 

0.25 

0.97 

0.01 

0.06 

0.81 

0.66 

0.94 

0.9884 

2:1 

0.50 

0.99 

0.50 

0.78 

0.75 

0.25 

0.97 

0.01 

0.07 

0.80 

0.66 

0.94 

0.9884 

1:1 

0.50 

0.99 

0.50 

0.75 

0.75 

0.25 

0.97 

0.00 

0.09 

0.84 

0.66 

0.94 

0.9884 

1:2 

0.50 

0.99 

0.25 

0.75 

0.75 

- 

0.97 

0.00 

0.01 

0.89 

0.66 

- 

- 

1:4 

0.50 

0.99 

0.25 

0.75 

0.75 

- 

0.97 

0.00 

0.01 

0.93 

0.66 

- 

- 

1:10 

0.50 

1.00 

0.25 

0.75 

0.75 

- 

0.97 

0.01 

0.01 

0.96 

0.66 

- 

- 

10:1 

0.36 

0.85 

0.39 

0.76 

0.65 

0.01 

0.99 

0.03 

0.06 

0.75 

0.53 

0.95 

0.9500 

4:1 

0.36 

0.89 

0.39 

0.76 

0.65 

0.02 

0.99 

0.04 

0.06 

0.77 

0.53 

0.94 

0.9500 

2:1 

0.36 

0.92 

0.40 

0.76 

0.65 

0.02 

0.99 

0.04 

0.06 

0.79 

0.53 

0.94 

0.9500 

1:1 

0.36 

0.91 

0.40 

0.75 

0.65 

0.02 

0.99 

0.04 

0.06 

0.82 

0.53 

0.94 

0.9500 

1:2 

0.36 

0.88 

0.40 

0.75 

0.65 

0.02 

0.99 

0.03 

0.06 

0.86 

0.53 

0.94 

0.9500 

1:4 

0.36 

0.77 

0.41 

0.74 

0.65 

0.02 

0.99 

0.02 

0.07 

0.89 

0.53 

0.94 

0.9500 

1:10 

0.36 

0.72 

0.42 

0.74 

0.65 

0.02 

0.99 

0.02 

0.07 

0.92 

0.53 

0.94 

0.9500 

Figure  64  shows  the  performance  surfaces  for  target  rich  (10:1)  and  target 
sparse  (1:10)  environments  for  the  HMM-based  single  sensor  system  operating  with 
long  observation  sequences  and  ±22.5°  target  aspect  knowledge.  One  sees  the  en¬ 
croachment  of  the  infeasible  portion  of  the  non-critical  error  surface  as  the  target 
priors  shift  from  10:1  to  1:10.  The  combination  of  decreased  declaration  performance 
and  infeasible  non-critical  error  surface  removes  all  jointly-feasible  solutions. 


Relative  changes  in  target  class  prior  probabilities  do  not  effect  performance 
measures  based  on  horizontal  analysis  of  confusion  matrices.  Horizontal  analysis 
focuses  on  the  number  of  true  class  records  that  are  correctly  labeled  given  that  a 
declaration  is  made.  Thus,  in  an  initial  experiment  with  equal  class  prior  proba¬ 
bilities  and  a  classifier  estimated  true-positive  performance  for  a  given  target  class 
i  of  90%,  then  90  of  100  class  i  records  are  correctly  labeled  given  a  declaration  is 
made  on  every  record.  In  a  follow-up  experiment  using  a  target  sparse  class  prior 
probability  of  1:10,  9  of  10  records  from  class  i  are  correctly  labeled.  In  either  case 
the  estimated  true-positive  performance  for  class  i  for  the  classifier  is  90%.  Evidence 
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Figure  64.  Performance  surfaces  determined  by  ROC  threshold  and  reject  thresh¬ 
old  settings.  HMM-based  classifier  surfaces  for  a  target  dense  environ¬ 
ment  (10:1)  are  above  the  fine;  surfaces  for  a  target  sparse  environ¬ 
ment  (1:10)  are  below  the  line.  Sensor  1  performance,  long  observation 
length,  and  ±22.5°  target  aspect  knowledge  are  used. 
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of  this  result  can  be  seen  in  the  top  and  bottom  subplots  of  Fig.  64  for  true-positive, 
out-of-library,  and  jointly- feasible  (true-positive)  performance  surfaces.  Whether  the 
class  prior  probabilities  are  10:1  (above  the  line)  or  1:10  (below  the  line),  the  surface 
shapes  do  not  change  for  these  horizontal  analysis  performance  measures. 

Vertical  analysis  of  the  confusion  matrix  measures  the  probability  of  correct  la¬ 
beling,  or  how  often  correct  given  that  a  class  i  label  is  applied  by  the  classifier.  The 
critical  error,  non-critical  error,  and  declaration  performance  measures  use  vertical 
analysis  of  the  confusion  matrices  and  are  influenced  by  the  class  prior  probabili¬ 
ties.  Evidence  of  this  influence  can  be  seen  in  the  different  surface  shapes  for  these 
performance  measures  in  Fig.  64. 

5. 6. 2. 3  Observation  sequence  length 

The  designed  experiment  explores  the  effects  of  changing  observation  sequence 
length.  The  initial  experiment  uses  a  medium  observation  sequence  length  setting 
of  5°  of  target  azimuth.  A  short  observation  length  setting  uses  2°  of  target  azimuth 
and  a  long  setting  uses  10°. 

Table  19  shows  performance  results  for  the  various  observation  length  settings 
for  an  HMM-based  system  with  co- located  sensors,  equal  class  prior  probabilities, 
and  no  prior  target  aspect  knowledge.  Table  20  shows  results  at  the  same  settings 
but  for  a  template-based  CID  system. 

Table  19  shows  a  general  improvement  in  robustness  with  increased  observation 
sequence  length.  This  result  can  be  seen  in  the  percentage  of  the  threshold  space  that 
is  jointly-feasible.  For  the  short  observation  length  setting,  only  the  boolean  fusion 
method  shows  any  jointly-feasible  settings.  At  the  medium  setting,  all  methods  show 
jointly-feasible  solutions.  At  the  long  setting,  the  size  of  the  jointly-feasible  space 
increases  (except  for  Sensor  2,  which  suffered  a  decrease  in  non-critical  feasibility), 
with  best  performance  occurring  for  the  boolean  fusion  method. 


164 


Table  19.  Performance  comparison  of  observation  sequence  lengths  for  HMM- 
based  classifier,  co-located  sensors,  equal  priors,  and  no  prior  target 
aspect  knowledge 


Obs  Length 

Fusion 

Percent  feasible 

Mean  feasible  value 

Opt  val 

tp 

crit 

n-crit 

dec 

ool 

joint 

tp 

crit 

n-crit 

dec 

ool 

joint 

0.85 

0.1 

0.2 

0.5 

0.35 

0.85 

Short 

Sensor  1 

0.44 

0.99 

0.31 

0.76 

0.68 

_ 

0.95 

0.07 

0.08 

0.83 

0.55 

_ 

_ 

Sensor  2 

0.26 

- 

0.26 

0.75 

0.74 

- 

1.00 

- 

0.03 

0.84 

0.54 

- 

- 

Mean 

0.50 

0.74 

0.25 

0.78 

0.75 

- 

0.96 

0.07 

0.02 

0.81 

0.51 

- 

- 

ANN 

0.35 

0.72 

0.42 

0.71 

0.56 

- 

0.98 

0.06 

0.08 

0.82 

0.45 

- 

- 

Label 

0.60 

0.97 

0.62 

0.39 

0.51 

0.11 

0.97 

0.06 

0.07 

0.67 

0.46 

0.88 

0.9431 

Medium 

Sensor  1 

0.50 

0.99 

0.50 

0.75 

0.75 

0.25 

0.96 

0.04 

0.10 

0.84 

0.56 

0.92 

0.9533 

Sensor  2 

0.32 

0.92 

0.50 

0.75 

0.70 

0.02 

0.97 

0.06 

0.09 

0.84 

0.63 

0.87 

0.8773 

Mean 

0.50 

0.99 

0.32 

0.75 

0.75 

0.07 

0.96 

0.04 

0.06 

0.83 

0.61 

0.93 

0.9499 

ANN 

0.38 

0.84 

0.44 

0.74 

0.65 

0.03 

0.98 

0.06 

0.07 

0.81 

0.60 

0.88 

0.9057 

Label 

0.62 

1.00 

0.85 

0.39 

0.45 

0.10 

0.98 

0.02 

0.07 

0.66 

0.47 

0.91 

0.9357 

Long 

Sensor  1 

0.50 

0.99 

0.55 

0.75 

0.75 

0.25 

0.95 

0.02 

0.08 

0.84 

0.65 

0.90 

0.9342 

Sensor  2 

0.26 

0.99 

0.25 

0.75 

0.75 

- 

1.00 

0.06 

0.01 

0.85 

0.61 

- 

- 

Mean 

0.50 

0.99 

0.50 

0.75 

0.75 

0.25 

0.94 

0.03 

0.08 

0.84 

0.71 

0.89 

0.9394 

ANN 

0.39 

0.71 

0.47 

0.74 

0.64 

0.02 

0.98 

0.05 

0.07 

0.81 

0.54 

0.91 

0.9129 

Label 

0.56 

0.99 

0.85 

0.53 

0.54 

0.17 

0.96 

0.02 

0.07 

0.64 

0.45 

0.89 

0.9652 

165 


Table  20.  Performance  comparison  of  observation  sequence  lengths  for  template- 
based  classifier,  co-located  sensors,  equal  priors,  and  no  prior  target 
aspect  knowledge 


Obs  Length 

Fusion 

Percent  feasible 

Mean  feasible  value 

Opt  val 

tp 

crit 

n-crit 

dec 

ool 

joint 

tp 

crit 

n-crit 

dec 

ool 

joint 

0.85 

0.1 

0.2 

0.5 

0.35 

0.85 

Short 

Sensor  1 

0.32 

0.37 

0.29 

0.70 

0.70 

_ 

0.99 

0.07 

0.07 

0.80 

0.53 

_ 

_ 

Sensor  2 

0.31 

0.29 

0.32 

0.70 

0.68 

- 

0.99 

0.07 

0.07 

0.79 

0.53 

- 

- 

Mean 

0.32 

0.49 

0.30 

0.70 

0.69 

- 

0.99 

0.07 

0.07 

0.79 

0.54 

- 

- 

ANN 

0.34 

0.21 

0.37 

0.67 

0.63 

- 

0.98 

0.08 

0.09 

0.82 

0.54 

- 

- 

Label 

0.34 

0.59 

0.67 

0.29 

0.36 

0.00 

0.99 

0.05 

0.09 

0.63 

0.48 

0.89 

0.9246 

Medium 

Sensor  1 

0.36 

0.54 

0.32 

0.71 

0.70 

0.00 

0.98 

0.05 

0.07 

0.80 

0.65 

0.96 

0.9564 

Sensor  2 

0.33 

0.49 

0.35 

0.69 

0.66 

- 

0.99 

0.06 

0.07 

0.81 

0.60 

- 

- 

Mean 

0.36 

0.59 

0.34 

0.70 

0.69 

0.00 

0.98 

0.05 

0.06 

0.81 

0.62 

0.95 

0.9450 

ANN 

0.35 

0.74 

0.39 

0.71 

0.62 

- 

0.98 

0.07 

0.08 

0.80 

0.53 

- 

- 

Label 

0.38 

0.77 

0.73 

0.31 

0.43 

0.00 

0.99 

0.03 

0.07 

0.64 

0.48 

0.88 

0.9363 

Long 

Sensor  1 

0.36 

0.54 

0.37 

0.69 

0.63 

- 

0.99 

0.05 

0.09 

0.83 

0.52 

- 

- 

Sensor  2 

0.35 

0.43 

0.31 

0.70 

0.67 

- 

0.98 

0.06 

0.10 

0.79 

0.54 

- 

- 

Mean 

0.38 

0.52 

0.30 

0.71 

0.69 

- 

0.98 

0.04 

0.06 

0.78 

0.58 

- 

- 

ANN 

0.35 

0.57 

0.44 

0.71 

0.60 

- 

0.98 

0.08 

0.09 

0.82 

0.53 

- 

- 

Label 

0.39 

0.75 

0.85 

0.31 

0.18 

- 

0.98 

0.03 

0.09 

0.65 

0.38 

- 

- 
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Table  20  shows  a  general  improvement  in  robustness  from  short  to  medium 
observation  sequence  length.  Notably,  the  critical  error  feasibility  space  improves 
and  the  out-of- library  performance  improves,  yielding  jointly-feasible  solutions  for 
three  of  the  five  fusion  methods. 

At  the  long  observation  sequence  length  setting,  the  template-based  classifier 
performance  fell  slightly  in  several  areas,  reducing  the  feasible  space  in  those  perfor¬ 
mance  categories  and  removing  all  jointly-feasible  solutions.  The  additional  informa¬ 
tion  gained  from  a  longer  observation  sequence  did  not  benefit  the  template-based 
classifier.  The  mean  vector  and  covariance  matrix  employed  in  the  template-based 
classification  methodology  may  have  been  corrupted  by  additional  noise  contained 
in  the  longer  observation  sequences. 

Figure  65  shows  the  performance  surfaces  for  the  cases  of  short  observation 
length  and  long  observation  length  for  the  HMM-based  co-located  sensors  using 
the  mean  fusion  method,  equal  class  prior  probabilities,  and  no  prior  target  aspect 
knowledge.  Marked  improvement  in  the  critical  error,  non-critical  error,  and  out-of- 
library  surfaces  is  evident. 

The  critical  error  surface  drops  several  percentage  points  and  becomes  almost 
entirely  feasible.  The  non-critical  error  surface  drops  and  increases  its  feasible  re¬ 
gion.  Non-critical  error  is  the  limiting  factor  for  the  short  observation  length  lack  of 
jointly-feasible  solutions.  The  improvement  in  non-critical  feasibility  creates  a  siz¬ 
able  jointly-feasible  space  at  the  long  observation  length.  Out-of- library  performance 
improves,  but  not  the  size  of  the  feasible  region. 

5. 6. 2-4  Sensor  correlation 

The  last  area  explored  by  the  designed  experiment  is  the  location  of  sensors. 
Two  settings  are  used;  the  first  has  the  sensors  co-located  on  a  single  platform,  the 
second  has  the  sensors  located  on  separate  platforms.  The  co-located  sensors  produce 
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Figure  65.  Performance  surfaces  determined  by  ROC  threshold  and  reject  thresh¬ 
old  settings.  HMM-based  classifier  surfaces  for  short  observation 
lengths  are  above  the  line,  and  surfaces  for  long  observation  lengths  are 
shown  below  the  line.  Co-located  sensors,  mean  fusion  method,  equal 
target  class  priors,  and  no  prior  target  aspect  knowledge  are  used. 
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Table  21. 


Performance  comparison  of  sensor  location  for  template-based  classifier, 
equal  priors,  long  observation  length,  and  no  prior  target  aspect  knowl¬ 
edge 


Sensors  Fusion  Percent  feasible  Mean  feasible  value  Opt  val 


tp 

crit 

n-crit 

dec 

ool 

joint 

tp 

crit 

n-crit 

dec 

ool 

joint 

0.85 

0.1 

0.2 

0.5 

0.35 

0.85 

Co-located 

Sensor  1 

0.36 

0.54 

0.37 

0.69 

0.63 

_ 

0.99 

0.05 

0.09 

0.83 

0.52 

- 

_ 

Sensor  2 

0.35 

0.43 

0.31 

0.70 

0.67 

- 

0.98 

0.06 

0.10 

0.79 

0.54 

- 

- 

Mean 

0.38 

0.52 

0.30 

0.71 

0.69 

- 

0.98 

0.04 

0.06 

0.78 

0.58 

- 

- 

ANN 

0.35 

0.57 

0.44 

0.71 

0.60 

- 

0.98 

0.08 

0.09 

0.82 

0.53 

- 

- 

Label 

0.39 

0.75 

0.85 

0.31 

0.18 

- 

0.98 

0.03 

0.09 

0.65 

0.38 

- 

- 

Independent 

Sensor  1 

0.37 

0.52 

0.33 

0.71 

0.70 

0.01 

0.98 

0.06 

0.06 

0.80 

0.61 

0.96 

0.9644 

Sensor  2 

0.34 

0.46 

0.29 

0.71 

0.70 

- 

0.98 

0.06 

0.09 

0.78 

0.57 

- 

- 

Mean 

0.36 

0.63 

0.34 

0.70 

0.70 

0.01 

0.98 

0.05 

0.08 

0.80 

0.60 

0.96 

0.9571 

ANN 

0.35 

0.62 

0.30 

0.73 

0.72 

0.01 

0.98 

0.07 

0.06 

0.79 

0.64 

0.94 

0.9433 

Label 

0.39 

0.80 

0.67 

0.26 

0.39 

0.01 

0.99 

0.03 

0.07 

0.63 

0.47 

0.91 

0.9831 

observations  that  begin  and  end  at  shared  target  azimuth  points.  The  independent 
sensors  produce  observation  sequences  of  the  same  aspect  window  size  but  that  begin 
and  end  at  different  target  azimuth  points  (see  Fig.  62). 

Table  21  shows  performance  results  for  the  two  sensor  location  settings  for  a 
template-based  system  with  equal  class  prior  probabilities,  long  observation  length, 
and  no  prior  target  aspect  knowledge. 

Table  21  indicates  an  improvement  in  Sensor  1  performance  which  leads  to 
jointly- feasible  solutions  for  the  fused  methods  (mean,  ANN,  and  boolean)  in  the 
independent  sensor  case.  The  slight  improvement  in  the  size  of  the  declaration  and 
out-of-library  feasible  spaces  for  Sensor  2  translates  into  benefits  when  fused  with 
Sensor  1.  Performance  gains  can  be  attributed  to  the  additional  information  resulting 
from  viewing  the  target  from  two  different  orientations  (independent  sensors)  versus 
a  single  orientation  (co- located  platform).  If  a  single  platform  presents  the  target  at 
an  orientation  where  poor  target  discrimination  information  exists,  then  co-located 
sensor  performance  is  poor.  Given  the  same  orientation  for  one  of  two  independent 
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sensors,  the  second  orientation  may  mitigate  the  poor  performance  by  supplementing 
information  with  greater  discriminatory  power  from  the  second  orientation. 

Figure  66  shows  the  performance  surfaces  for  the  cases  of  co-located  sensors 
and  independent  sensors  for  the  HMM-based  system  using  the  mean  fusion  method, 
equal  class  prior  probabilities,  no  prior  target  aspect  knowledge,  and  long  observation 
sequence  length. 

Slight  improvements  in  critical  error,  non-critical  error,  and  out-of-library  fea¬ 
sibility  spaces  lead  to  several  jointly-feasible  solutions.  The  additional  information 
provided  by  independent  sensors  results  in  slight  enhancement  of  the  fused  feasible 
regions,  leading  to  jointly-feasible  solutions  where  none  existed  for  the  co-located 
sensor  arrangement. 

5. 6. 2. 5  Label  fusion  sensor  thresholds 

The  surface  plots  used  in  the  previous  sections  showed  performance  of  all  of 
the  fusion  methods  except  label  fusion.  The  label  fusion  methodology  used  in  both 
the  initial  and  designed  experiments  employs  a  set  of  ROC  and  rejection  region 
thresholds  for  each  classifier.  The  labeling  of  test  records  occurs  at  each  classifier, 
whereupon  the  labels  are  fused  via  a  set  of  label  fusion  rules.  In  the  case  of  the  other 
fusion  methods,  classifier  outputs  are  fused  and  then  labeled  with  a  single  application 
of  the  ROC  and  rejection  region  thresholds.  Thus,  the  label  fusion  method  has  a  four¬ 
dimensional  threshold  space,  versus  the  two-dimensional  space  of  the  other  fusion 
methods. 

Figure  67  shows  the  location  in  threshold  space  for  classifiers  1  and  2  of  the 
optimal  solution  (s)  when  the  label  fusion  rule  is  applied  in  the  case  of  co-located 
sensors  with  an  HMM-based  system. 

For  the  target  dense  environments  (10:1,  4:1,  2:1,  and  1:1),  the  optimal  thresh¬ 
old  settings  remain  the  same.  Classifier  2  thresholds  are  held  at  two  locations,  while 
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Figure  66.  Performance  surfaces  determined  by  ROC  threshold  and  reject  thresh¬ 
old  settings.  Template-based  classifier  surfaces  for  co- located  sensors 
are  above  the  line,  and  surfaces  for  independent  sensors  are  below  the 
line.  Mean  fusion  method,  equal  target  class  priors,  no  prior  target 
aspect  knowledge,  and  long  observation  length  are  used. 
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Figure  67.  Optimal  threshold  settings  for  classifiers  1  and  2  when  label  fusion  is 
applied.  Subplots  correspond  to  class  prevalence  settings  for  hostile  to 
friend/neutral  ratios  of  10:1,  4:1,  2:1,  1:1,  1:2,  1:4,  and  1:10  respectively. 
A  HMM-based  system  with  co-located  sensors,  long  observation  length, 
and  prior  target  aspect  ±22.5°  is  used. 

classifier  1  thresholds  vary  widely  in  the  threshold  space.  In  the  target  sparse  envi¬ 
ronments  the  optimal  threshold  settings  change.  Indeed,  in  the  case  where  hostile  to 
friend/neutral  prior  probability  is  1:2,  the  optimal  threshold  locations  for  classifiers 
1  and  2  separate,  indicating  the  label  fusion  rule’s  flexibility  allows  each  classifier  to 
perform  well  in  a  different  area. 

Figure  68  shows  optimal  threshold  settings  for  the  case  where  there  is  no  prior 
target  aspect  knowledge.  Behavior  similar  to  Fig.  67  is  seen,  where  optimal  threshold 
settings  for  each  classifier  occur  at  different  locations  for  the  target  dense  settings. 
A  much  smaller  jointly-optimal  solution  space  exists  due  to  reduced  classifier  per¬ 
formance  resulting  from  lack  of  prior  target  aspect  information. 

Figure  69  shows  optimal  threshold  settings  for  the  case  where  classifiers  1 
and  2  are  located  on  different  platforms.  Classifiers  1  and  2  are  combined  with 
template-based  classifiers  acting  on  long  observation  sequences  with  prior  target 
aspect  knowledge  of  ±37.5°.  Again,  the  optimal  threshold  settings  change  as  the 
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Figure  68.  Optimal  threshold  settings  for  sensors  1  and  2  when  label  fusion  is  ap¬ 
plied.  Subplots  correspond  to  class  prevalence  settings  for  hostile  to 
friend/neutral  ratios  of  10:1,  4:1,  2:1,  1:1,  1:2,  1:4,  and  1:10,  respec¬ 
tively.  A  HMM-based  system  with  co- located  sensors,  long  observation 
length  and  no  prior  target  aspect  knowledge  is  used. 

hostile  to  friend/neutral  prior  probability  ratio  changes.  Also,  optimal  threshold 
settings  for  classifier  1  and  classifier  2  occur  in  different  locations. 

The  label  fusion  rule  provides  the  multiple  classifier  system  with  greater  flexi¬ 
bility  in  setting  its  thresholds  compared  to  the  other  fusion  methods.  This  flexibility 
allows  the  system  to  optimize  sensor  performance  independently.  The  information 
lost  in  using  a  relatively  simple  set  of  label  fusion  rules  is  compensated  by  the  flexi¬ 
bility  inherent  in  the  larger  threshold  space.  As  a  result,  label  fusion  performance  is 
comparable  to  that  of  the  other  fusion  methods. 


5. 6. 2. 6  Combined  results 

Figures  70-  75  provide  performance  results  across  all  settings  within  the  de¬ 
signed  experiment.  The  performance  results  are  presented  using  grayscale:  lighter 
shades  are  better,  white  is  best,  and  black  is  worst. 
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Figure  69.  Optimal  threshold  settings  for  classifiers  1  and  2  when  label  fusion  is 
applied.  Subplots  correspond  to  class  prevalence  settings  for  hostile 
to  friend/neutral  ratios  of  10:1,  4:1,  2:1,  1:1,  1:2,  1:4,  and  1:10,  re¬ 
spectively.  A  template-based  system  with  independent  sensors,  long 
observation  length  and  prior  target  aspect  ±37.5°  is  used. 

A  single  figure  has  two  subplots.  The  top  subplot  shows  performance  results  for 
an  HMM-based  system.  The  bottom  shows  results  from  a  template-based  system. 
Each  pair  of  subplots  show  results  from  the  same  designed  experimental  settings 
for  sensor  location  and  prior  knowledge  of  target  aspect.  Performance  results  within 
each  subplot  explore  experimental  settings  of  observation  length,  fusion  method,  and 
target  class  prior  probabilities. 

Using  a  single  subplot  as  an  example  and  starting  down  the  left-hand  side, 
five  different  fusion  methods  are  apparent:  Sensor  1  acting  independently,  Sensor  2 
acting  independently,  Sensors  1  and  2  fused  with  a  mean  fusion  rule,  Sensors  1  and 
2  fused  with  a  neural  network  fusion  rule,  and  Sensors  1  and  2  fused  using  label 
fusion.  Within  each  of  these  fusion  categories,  performance  is  given  by  target  class 
prior  probabilities,  which  are  fisted  on  the  right-hand  side  of  the  subplot  and  cover 
the  settings  10:1,  4:1,  2:1,  1:1,  1:2,  1:4,  and  1:10  for  hostile  to  friend/neutral  target 
class  ratios. 
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Across  the  top  of  the  subplot  are  the  three  observation  length  settings:  short, 
medium,  and  long.  Within  each  observation  length  setting  performance  is  captured 
using  the  same  criteria  as  the  tables  used  earlier.  These  criteria  include  measures  of 
robustness  for  each  performance  category  (labeled  “robust”  at  the  bottom),  the  mean 
feasible  values  for  each  performance  category,  and  the  optimal  jointly-feasible  value. 
The  performance  categories  include  true-positive,  critical  error,  non-critical  error, 
declaration,  out-of-library,  and  joint  performance.  These  categories  are  repeated  for 
each  observation  length  setting. 

Figure  70  shows  results  for  HMM-based  (top)  and  template-based  (bottom) 
systems  where  the  sensors  are  co-located  and  prior  knowledge  of  target  aspect  is 
±22.5°.  The  best  performance  at  this  setting  is  the  HMM-based  system  with  neu¬ 
ral  network  fusion.  As  observation  length  increases  (reading  from  left  to  right), 
jointly-feasible  performance  improves,  attaining  optimal  solutions  for  nearly  all  class 
prevalence  settings. 
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Figure  70.  Combined  results  for  HMM-based  (top)  and  template-based  (bottom) 
classifiers  with  co-located  sensors  and  ±22.5°  prior  target  aspect  knowl¬ 
edge 
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Figure  71  shows  results  for  HMM-based  (top)  and  template-based  (bottom) 
systems  where  the  sensors  are  co-located  and  prior  knowledge  of  target  aspect  is 
±37.5°.  The  degraded  prior  aspect  knowledge  affects  the  HMM-based  system  at 
short  observation  lengths  as  evidenced  by  the  large  black  jointly- feasible  regions  for 
Sensor  1,  Sensor  2,  and  label  fusion.  As  observation  length  increases,  the  HMM- 
based  classifier  system  overcomes  the  degraded  prior  target  aspect  knowledge.  With 
neural  network  fusion  at  the  longest  observation  length  setting,  performance  is  near 
perfect  at  all  but  two  class  prevalence  settings. 

The  template-based  system  fared  better  than  the  HMM  system  at  the  short 
observation  length  setting.  However,  at  the  long  observation  length  setting  only  a 
few  combinations  yield  jointly-feasible  solutions.  In  both  the  HMM  and  template 
cases  the  neural  network  fusion  method  outperformed  its  competitors. 
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Figure  71.  Combined  results  for  HMM-based  (top)  and  template-based  (bottom) 
classifiers  with  co-located  sensors  and  ±37.5°  prior  target  aspect  knowl¬ 
edge 
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Figure  72  shows  results  for  HMM-based  (top)  and  template-based  (bottom) 
systems  where  the  sensors  are  co- located  with  no  prior  knowledge  of  target  aspect. 
The  degraded  prior  aspect  knowledge  affects  the  template-based  classifier  system 
more  dramatically  than  in  Fig.  71.  The  HMM-based  system  provides  good  perfor¬ 
mance  at  the  medium  and  long  observation  length  settings  with  label  fusion  proving 
to  be  best. 
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Figure  72. 


Combined  results  for  HMM-based  (top)  and  template-based  (bottom) 
classifiers  with  co-located  sensors  and  no  prior  target  aspect  knowledge 
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Figure  73  shows  results  for  HMM-based  (top)  and  template-based  (bottom) 
systems  where  the  sensors  are  located  on  different  platforms  with  prior  knowledge 
of  target  aspect  ±22.5°.  Independent  sensors  improve  feasibility  at  the  short  and 
medium  observation  length  settings  for  the  HMM-based  system  compared  to  Fig.  70, 
but  reduced  feasibility  for  the  neural  network  fusion  method,  which  had  been  the 
best  performer. 
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Figure  73.  Combined  results  for  HMM-based  (top)  and  template-based  (bottom) 
classifiers  with  independent  sensors  and  ±22.5°  prior  target  aspect 
knowledge 
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Figure  74  shows  results  for  HMM-based  (top)  and  template-based  (bottom) 
systems  where  the  sensors  are  located  on  different  platforms  with  prior  knowledge 
of  target  aspect  ±37.5°.  Independent  sensor  improved  performance  levels  over  those 
shown  in  Fig.  71,  but  no  significant  changes  in  feasibility  are  observed. 
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Figure  74.  Combined  results  for  HMM-based  (top)  and  template-based  (bottom) 
classifiers  with  independent  sensors  and  ±37.5°  prior  target  aspect 
knowledge 
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Figure  75  shows  results  for  HMM-based  (top)  and  template-based  (bottom) 
systems  where  the  sensors  are  located  on  different  platforms  with  no  prior  knowledge 
of  target  aspect.  Independent  sensor  improved  performance  levels  over  those  shown 
in  Fig.  72.  Significant  improvement  in  jointly- feasible  space  occurred  in  the  template- 
based  system  at  the  long  observation  length  setting 
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Figure  75.  Combined  results  for  HMM-based  (top)  and  template-based  (bottom) 
classifiers  with  independent  sensors  and  no  prior  target  aspect  knowl¬ 
edge 
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A  summary  of  the  combined  results  notes  the  broad  advantage  in  robustness 
and  performance  measure  values  held  by  the  HMM-based  system  over  the  template- 
based  system  across  experimental  settings.  The  combination  of  HMM-based  classi¬ 
fiers  and  neural  fusion  produced  the  best  results  when  sensors  were  co-located  and 
with  some  prior  knowledge  of  target  aspect  available.  With  no  prior  aspect  knowl¬ 
edge  the  boolean  fusion  method  performed  best:  its  ability  to  set  sensor  thresholds 
independently  overcame  its  simple  set  of  label  fusion  rules.  With  independent  sen¬ 
sors,  the  HMM-based  system  again  proved  to  be  best. 
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6.  Contributions  and  Future  Research 

This  chapter  describes  research  contributions  and  suggests  future  research. 


6. 1  Research  contributions 

Contributions  from  this  dissertation  research  are  in  the  following  areas: 

•  Development  of  an  HMM-based  time  series  classifier 

•  Extension  of  Laine’s  CID  optimization  framework  to  include  out-of-library  per¬ 
formance 

•  Development  of  an  out-of-library  classification  methodology 

•  Development  of  a  target  pose-estimation  methodology  using  principal  compo¬ 
nent  analysis 

•  Application  of  the  extended  framework  to  a  multi-class  ATR  experiment  that 
competes  the  HMM-based  classifier  against  a  template-based  classifier 

•  Development  of  the  framework  to  allow  classifiers  to  make  reject,  or  not  declare, 
decisions,  to  test  classifiers  against  out-of-library  records,  and  to  measure  the 
performance  of  three  different  fusion  methods 

•  Development  of  evidence  for  independent  optimal  threshold  settings  for  label 
fusion 

6.1.1  Literature  review 

A  comprehensive  review  of  the  literature  covers  the  theory  and  development 
of  hidden  Markov  models.  The  application  of  HMMs  to  ATR  problems  using  high 
range-resolution  radar  signatures  as  features  is  described  in  Sec.  2.1.3.10,  and  it 
reveals  limitations  in  treatment  of  prior  knowledge  of  target  aspect,  inclusion  of  a 
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rejection  option,  and  performance  considering  out-of-library  targets.  Other  research 
areas  covered  in  the  literature  review  include  model  complexity  in  HMMs,  multiple 
classifier  fusion,  rejection  theory,  and  Laine’s  CID  optimization  framework. 

6.1.2  Development  of  HMM-based  classifier 

Chapter  3  describes  the  development  of  an  HMM-based  time  series  classifier. 
Ultimately,  the  methodology  results  in  a  multi-dimensional  Gaussian  HMM  operat¬ 
ing  on  HRR-derived  feature  data.  The  model  takes  as  input  a  sequence  of  feature 
data  ordered  by  target  aspect  angle.  The  model  develops  relationships  between  the 
observation  distribution  associated  with  each  hidden  state  and  the  signature  of  the 
target  within  a  range  of  aspect  angles. 

6.1.3  Extended  CID  framework 

Chapter  4  extends  Laine’s  CID  optimization  framework  by  including  an  out- 
of-library  performance  measure.  The  framework  retains  the  desired  characteristic  of 
allowing  trade-off  analysis  without  explicit  classification  error  costs. 

6.1.4  Development  of  out- of -library  methodology 

Section  4. 3. 2. 5  describes  a  methodology  whereby  a  classifier  assigns  an  esti¬ 
mated  posterior  probability  of  out-of-library  class  membership  to  a  test  record.  This 
methodology  is  implemented  as  a  post-processing  step  after  the  classifier  trained  on 
in-library  classes  has  adjudicated  the  test  record.  The  methodology  produces  the 
estimated  out-of-library  posterior  probability  as  a  function  of  the  in-library  class 
posterior  probabilities  produced  by  the  classifier.  At  some  experimental  settings  the 
out-of- library  discriminator  correctly  identified  60%  of  out-of- library  records. 
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Figure  76.  SAR  chip  image  processing  steps  lead  to  a  target  mask  that  is  evaluated 
using  principal  component  analysis,  resulting  in  an  estimated  target 
pose. 


6.1.5  Development  of  target  pose  estimator 


Section  5.4.5  develops  an  on-line  method  to  estimate  target  aspect  angle  based 
on  a  target  mask  of  a  SAR  image.  The  method  uses  principal  component  analysis  to 
find  the  major  axis  of  the  target  mask.  An  initial  experiment  found  pose  estimation 
error  to  be  roughly  11°.  Figure  76  highlights  the  steps  taken  to  estimate  target  pose. 


6.1.6  Application  of  extended  CID  framework 

Chapter  5  details  the  application  of  the  extended  CID  framework  to  an  ATR 
experiment  using  DCS  radar  SAR  data.  The  experiment  competes  an  HMM-based 
system  (a  derivative  of  the  Chapter  3  system)  against  a  template-based  classifier. 
The  extended  framework  allows  the  systems  to  be  compared  inclusive  of  warfighter 
constraints,  rejection  option,  and  out-of-library  target  records.  Results  show  that  the 
HMM-based  system  provides  the  warfighter  with  better  and  more  robust  performance 
across  a  variety  of  experiment  settings,  including  fusion  rule,  hostile/friend  class 
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Figure  77.  Differential  results  across  experimental  settings  with  co-located  sensors 
and  no  prior  target  aspect  knowledge.  White  portions  indicate  better 
performance  for  HMM-based  classifier,  black  portions  indicate  better 
performance  for  template-based  classifier,  and  gray  indicates  equal  per¬ 
formance. 


prevalence,  observation  length,  and  prior  knowledge  of  target  aspect  angle.  Figure  77 
shows  that  the  HMM-based  classifier  (white)  is  preferred  over  the  template-based 
classifier  (black)  at  a  majority  of  experimental  settings.  Also,  the  size  of  feasible 
region  in  the  threshold  space  is  shown  to  provide  a  simple  comparative  measure 
of  classifier  robustness,  and  performance  surfaces  are  shown  to  convey  performance 
information  and  trade-space  efficiently. 


6.1.7  Evidence  of  independent  threshold  setting  in  fused  system 

Laine’s  research  [15]  demonstrates  that  independent  thresholding  for  each  clas¬ 
sifier  prior  to  applying  label  fusion  allows  improved  performance  over  the  applica¬ 
tion  of  a  single  threshold  after  the  fusion  of  classifier  outputs.  Section  5. 6. 2. 5  shows 
that  independent  thresholding  yields  optimal  thresholds  in  different  locations  of  the 
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Figure  78.  Optimal  threshold  settings  for  classifiers  1  and  2  when  label  fusion  is 
applied.  Note  settings  occur  in  different  locations  for  each  classifier. 

threshold  space  for  each  classifier.  This  added  flexibility  allows  the  label  fusion 
method  to  combine  a  classifier  that  performs  well  in  one  performance  measure  but 
poorly  elsewhere  with  a  second  classifier  whose  threshold  setting  allows  it  to  perform 
well  in  another  area.  Figure  78  shows  an  example  of  the  optimal  threshold  settings 
for  two  classifiers. 

6.2  Future  research 

Two  areas  for  future  research  follow. 

1.  This  research  chooses  an  arbitrary  feature  set  from  HRR  signatures  of  the  tar¬ 
gets.  The  literature  does  not  identify  a  method  for  determining  an  appropriate 
number  of  features  to  extract  from  HRR  profiles.  The  number  of  scattering 
centers  is  most  likely  related  to  target  type  and  signal-to-noise  ratio  in  the  SAR 
image.  Thus  a  feature  saliency  methodology  could  be  developed  to  choose  HRR 
features. 

2.  Converting  the  output  of  multiple  HMM  classifiers  to  class  posterior  prob¬ 
abilities  allows  the  implementation  of  a  Bayesian  network  that  could  learn  a 
Bayes-optimal  combination  of  classifiers  with  respect  to  classification  accuracy. 
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This  concept  could  also  be  implemented  at  the  feature  level  to  determine  an 
optimal  weighting  of  the  selected  features  used  in  classification. 
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Appendix  A.  List  of  Abbreviations 

ACC/DRSA  Air  Combat  Command’s  CID  issues  branch 

AFDD  Air  Force  Doctrine  Document 

AFIT  Air  Force  Institute  of  Technology 

AFPAM  Air  Force  pamphlet 

AFRL  Air  Force  Research  Laboratory 

AIC  Akaike’s  information  criterion 

ANN  Artificial  neural  network 

ATD/R  Automatic  target  detection  and  recognition 

ATR  Automatic  target  recognition 

BIC  Bayesian  information  criterion 

CAD  Computer-aided  design 

CID  Combat  identification 

DAI-DAO  Data  in  -  data  out  fusion 

DARPA  Defense  Advanced  Research  Projects  Agency 

DEI-DEO  Decision  in  -  decision  out  fusion 

DNA  Deoxyribonucleic  acid 

DTMC  Discrete  time  Markov  chain 

EM  Expectation  maximization 

EOC  Extended  operating  conditions 

FEI-DEO  Features  in  -  decision  out  fusion 

FEI-FEO  Features  in  -  features  out  fusion 

FFT  Fast  Fourier  transform 

FLIR  Forward-looking  infrared 

FN  Friend/ neutral 

FOS  Family  of  systems 

HMM  Hidden  Markov  model 

HRR  High  range-resolution  radar 

ISR  Intelligence  surveillance  and  reconnaissance 


JFC 

JP 


Joint  Forces  Commander 
Joint  Publication 
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MCS 

MLE 

MLPNN 

MP 

MSTAR 

Multiple  classifier  system 

Maximum  likelihood  estimator 

Multi-layer  perceptron  neural  network 

Mathematical  programming 

Moving  and  stationary  target  acquisition  and  recognition 

OH 

OOL 

Other  hostile 

Out-of-library 

PCA 

PCS 

Principal  component  analysis 

Probability  of  correct  selection 

ROC 

ROI 

Receiver  operating  characteristic 

Region  of  interest 

SAR 

SNL 

Synthetic  aperture  radar 

Sandia  National  Laboratory 

TOD 

TPR 

Target  of  the  day 

True-positive  rate 

UCSC 

USAF 

University  of  California  at  Santa  Cruz 

United  States  Air  Force 
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Appendix  B.  MATLAB  code 

B.l  run_script.ru 

°/0  Tim  Albrecht 
°/o  AFIT/ENS 
°/0  Sep  2005 

°/0  this  script  will  perform  multiple  runs  of  the  HMM  DCS  Project 

clear 
close  all 

°/0  disable  warning  messages 
warning  off  all 

°/0  add  path  to  m-files  from  HMM  Toolbox 
addpath  ([pwd,  ,\tools;]) 


0  /  0  /  0 !  0 I  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0 !  0  /  0  / 

/o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o 

°/0  Data  Structures  °/0 


0  /  0  /  0  /  0  /  0 1 0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  / 
/o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o 


ghmm_results  =  struct ( ; log_lik; ,[]) ; 

°/0  structure  to  save  log-likelihoods  at  each  experiment  setting 
°/0  ’ghmm.results  (target  ,num_ states ,  seq_length,f  eature_set)  .  log_lik 
°/0  [test  records,  label  class]  ; 

ool_results  =  struct ( ; log_lik; ,[]) ; 

°/0  out  of  library  test  results 

°/0  ’  ool_results  (target  ,num_states ,  seq_length,  f  eature_set)  .log_lik  = 
°/0  [test  records,  label  class]  ; 

sensor_post  =  struct  (  ;post ;  ,[],  ;  ool_post ;  ,[])  ; 

°/0  structure  to  save  posteriors  from  each  sensor  (i.e.  feature  set) 
°/0  at  each  experiment  setting 

°/0  ; sensor_post (target ,num_states , seq_length, f eature_set)  .post  =  .. 
°/0  [test  records,  label  class]  ; 

loglik_post  =  struct ( ’post ; ,  [] , ; ool_post ; ,  [] ) ; 

°/0  structure  to  save  posteriors  from  mean  loglik  fused  sensors  at 
°/0  each  experiment  setting 

°/0  ’  loglik_post  (data_class  ,num_states ,  seq_length)  .post  =  ... 

°/0  [records  label  class]  ; 

ann_post  =  struct  (  ;  post ;  ,[],  ;  ool_post ;  ,[])  ; 

°/0  structure  to  save  posteriors  from  ann  fused  sensors  at 
°/0  each  experiment  setting 

°/0  ;  ann_post  (num_states ,  seq_length)  .post  =  ... 

°/0  [records  label  class]  ; 

save ( [pwd , ; \output ; ] , ; ghmm_results ; ; ool_results ; ; sensor_post ; . 
J loglik_post ; ; ann_post ; ) 

init_hmm  =  struct  (  ; prior ;  ,  []  ,  ;  trans 5  ,  []  ,  ’mu’  ,  []  ,  5  sigma ;  ,  [] )  ; 
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°/0  structure  to  save  init  ghmm  parameters  at  each  experiment  setting 
°/0  ’  init_hmm (target  ,num_states , feature_set)  .prior  =  []  ’ 
train_hmm  =  struct  (  ’  prior  ’  ,  []  ,  ’trans ’  ,  []  ,  ’mu’  ,  []  ,  ’  sigma’  ,  [] )  ; 

°/0  structure  to  save  trained  ghmm  parameters  at  each  experiment  setting 
°/0  ’train.hmm (target ,num_states ,f eature.set)  .prior  =  []  ’ 
ann  =  struct ( ’ net ’ ,  [] ) ; 

°/0  structure  to  save  trained  ann  fusor 
°/0  ’  ann  (num_ states ,  seq_length)  .net’ 
ann_data  =  struct ( ’ log_lik ’ ,  [] , ’ post ’ ,  [] ) ; 

°/0  data  structure  to  hold  training  records,  log-liks,  and  posteriors 
°/0  ’ ann.data (target ,feature_set , seq.length) . log.lik  =  [record  targetHMM] 
°/0  .post  =  [record  targetHMM]  ’ 

save( [pwd, ’\param’] , ’ init.hmm’ , ’train.hmm’ , ’ann’ , ,ann_data;) ; 


0/0/ 0/0/ 0/0/ 0/0/ 0/0/ 0/0/ 
/o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o 

°/0  Settings  °/0 
0/0/ 0/0/ 0/0/ 0/0/ 0/0/ 0/0/ 
/o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o 


settings  =  struct ( ;num_states ; , [] , ’ test_lengthJ ,[],... 

;f  eature_sets;  ,  []  ,  ; f  eature_num;  ,  []  ,  ; targets ;  ,  []  )  ; 
save ( [pwd, ; \data’] , ; settings ; ) 

°/0  ;  -  .num_states ;  is  the  number  of  hidden  states  in  the  hidden 
°/0  Markov  model;  defines  the  complexity  of  the  model. 

°/0  ’  - .  test_length;  is  the  number  of  observation  in  a  single  test  record; 

°/0  here  we  have  various  settings,  all  less  than  or  equal  to  the  training 
°/0  sequence  length. 

°/0  ’  - .  f  eature_sets ;  name  (and  number)  of  feature  sets  in  experiment 

°/0  ;  .  f  eature_num;  number  of  features  in  each  feature  set 

°/0  ; targets ;  name  (and  number)  of  target  classes  in  experiment 

settings .num_ states  =  [10  20  30  40  60  72  90]; 
settings . test_length  =  [4  10  20]; 
settings . feature_sets  =  {,HH,,,VV,}; 
settings . feature_num  =  [10  10]; 

°/0  order  the  target  list  with  1  TOD,  4  OH,  and  5  FN 

settings .targets  =  {’target.!’ , ’target_2’ , ’target_5’ , ’target_10’ , . . . 

’target_13’ , ’target_6’ , ’target_7’ , ’target.ll’ , . . . 
’target_12’ , ’ target_15 ’ } ; 
save ( [pwd, ’ \data’] , ’ settings ’ , ’ -append’ ) 

0/0/ 0/0/ 0/0/ 0/0/ 0/0/ 0/0/0/ 

/ 0  / 0  / 0  / 0  / 0  / 0  / 0  /o  /o  /o  / 0  / 0  / 0 

°/0  Scripting  °/0 
0/0/ 0/0/ 0/0/ 0/0/ 0/0/ 0/0/0/ 

/o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o 

for  i  =  1:  size  (settings,  num.states,  2)  °/0  hidden  state  loop 

state  =  settings .num_states(i) ; 


°/0  clocking  mechanism 
tic 

disp(’  ’) 
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disp( [’begin  train/test  scripting  at  ’, int2str (state) ,.. . 

’  states’]) 

°/0  ’build_train’  builds  training  seq  across  360  deg 
build_train 

°/0  ’train_HMM’  trains  hmms  for  each  target  type  and  each  feature  set  at 
°/0  the  given  experiment  settings 
train_HMM (state) 

t  =  toe; 

disp( [’trained  at  ’, int2str (state) , ’  states.  Time  =  ’,... 
num2str (t/60) , ’  mins’]) 

for  k  =  1 :  size  (settings  . test_length,  2)  °/0  sequence  length  loop 

seq_length  =  settings . test_length(k) ; 

°/0  clocking  mechanism 
tic 

°/0  ’build_trainANN’  builds  train  records  for  ANN  fusor  training 
build_trainANN(state ,  seq_length) 

°/0  ’train_ANN’  trains  ANN  fuser  using  HMM  outputs  given  sequences 
°/0  from  training  set 
train_ANN(state ,  seq_length) 

°/0  ’build_test’  builds  100  test  records  of  each  target  type  for 
°/0  each  feature  set.  the  initial  aspect  angle  of  the  records  is 
°/0  chosen  randomly  and  spans  an  interval  of  angles  defined  by 
°/0  ’  seq_length’ . 
build_test (state ,  seq_length) 

°/0  ’  test_HMM’  takes  the  test  records  and  evaluates  them  using 
°/0  the  trained  HMMs . 
t e st _HMM (state ,  seq_length) 

°/0  fuse  HMM  outputs  using  trained  ANNs 
test_ANN (state ,  seq_length) 

°/0  test  trained  HMMs  using  out-of -library  records,  fuse  using  ANN 
°/0  fuser,  record  results 
test_outlibrary (state ,  seq_length) 

t  =  toe; 

disp(  [’tested  at  sequence  length  ’ , int2str (seq_length) , . . . 

’.  Time  =  ’ ,num2str (t/60) , ’  mins’]) 

end 

end 
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B.2  build-train. m 


function  build_train 

°/0  Tim  Albrecht 
°/o  AFIT/ENS 
°/0  Sep  2005 

°/0  ,build_train;  builds  the  training  data  sets  for  the  HMMs 

°/0  load  settings 

load( [pwd, ; \data;] , ; settings , ) 

°/0  settings  .  feature_sets  =  {,HH,,,VVJ}; 

°/0  settings  .targets  =  {,target_l;  ,  ,target_2;  ,  ’target.S’  ,  ’target.lO’  ,  .  .  . 

°/0  ’target.lS’  ,  ;target_6;  ,  ;target_7;  ,  ,target_ll;  ,  .  .  . 

°/0  ,target_12;  ,  ^target.lS^}; 

°/0  create  data  structure  to  store  training  records 
train_records  =  struct ( ’data’ ,  []  ) ; 

°/0  train.records (target ,feature_set) .data  =  [target  exemplar]; 

°/0  build  training  sequence  across  360  degrees  of  aspect  angle 
for  target  =  1 :  length  (settings  .targets)  °/0  num  targets 

for  feature_set  =  1 :  length  (settings  .  f  eature_sets)  °/0  num  feature  sets 

°/0  load  the  appropriate  feature  set /target  data  into  structure 
°/o  called  J data* 

load_str  =  [pwd, ; \trial8_dataV , ’train. ; ,.. . 

settings . f eature_sets{f eature.set}]  ; 
target.str  =  [settings .targets{target}, ,.. . 

settings . feature_sets{f eature.set}] ; 
data  =  load(load_str ,target_str) ; 

eval([’data  =  data. ’ ,settings .targets{target}, . 
settings . f eature_sets{f eature.set} , ’ ; ’]  ) 

for  num.exemplar  =  1 :  l°/0size  (data, 2)  for  case  of  multiple  exemplars 
temp.data  =  data (num_ exemplar) . feature ; 

train.records (target ,f eature.set) . data  =  temp.data; 

end  °/0  end  exemplar  loop 
end  °/0  end  feature  set  loop 
end  °/0  end  target  set  loop 

save ( [pwd , ’ \data  ’ ] , ’ train.records ’ , ’ -append ’ ) 
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B .  3  trairi-HMM.  m 


function  train_HMM(num_ states) 

°/0  Tim  Albrecht 
°/o  AFIT/ENS 
°/0  Sep  2005 

°/0  ,train_HMM,  trains  the  HMMs  used  in  the  classification  scheme. 

°/0  load  the  training  data  sequences  and  settings 
load  (  [pwd, 3 \data*  ] , ’ train_records ; J settings ; ) 

°/0  ;train_records (target ,feat_set) . data  =  [feature_dim  obs] ; ; 

°/0  settings  .  feature_sets  =  {,HH,,,VVJ}; 

°/0  settings .  feature_num  =  [10  10]; 

°/0  settings  .targets  =  {’target.!’  ,  ’target_2’  ,  ’target_5’  ,  ^target.lO^  ,  .  .  . 

°/0  ’target.lS’  ,  ;target_6;  ,  ,target_7;  ,  ,target_ll;  ,  .  .  . 

°/0  ,target_12;  ,  ^target.lS^}; 

°/0  load  the  ghmm  parameter  data  structures 
load  ( [pwd , 3 \param ; ; init_hmm ; J  train_hmm ; ) 

°/0  set  aspect  window  bins  according  to  number  of  hidden  states 
binwidth  =  round((360  -  1)  ./  num_states) ; 
xx  =  binwidth* (0 :num_ states) ;  xx(l)  =  1; 
xx (length (xx) )  =  360; 

°/0  create  initial  HMM  matricies:  the  prior,  the  state  transition 

°/0  and  mu  and  sigma  of  observation  distributions 

for  target  =  1 :  length  (settings  .targets)  °/0  num  targets 

for  feature_set  =  1 :  length  (settings  .  f  eature_sets)  °/0  num  feature  sets 

°/0  force  training  sequence  to  begin  in  state  1 
in  it_hinm  (target  ,num_states ,  f  eature_set)  .prior  =  ... 
zeros (num_states , 1) ; 

init_hmm (target ,num_states , feature_set) .prior (1 , 1)  =  1; 

°/0  create  bi-diagonal  hidden  state  structure 
in  it_hinm  (target  ,num_states ,  f  eature_set)  .trans  =  ... 

zeros (num_states) ; 
for  i  =  l:num_states 

fwd_state  =  mod(i,  num_states)  +  1; 

init_hmm( target ,num_states ,f eature_set) . trans (i,fwd_state)  =  0.5 
init_hmm( target ,num_states ,f eature_set) .trans (i , i)  =  0.5; 

end 

temp  =  train_records (target , feature_set) . data; 

°/0  initialize  mu  and  sigma  by  averageing  and  taking  std  of  bin;d 
°/0  feature  observations  (determined  by  number  of  states) 
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for  i  =  1 :num_states 

init_hmm( target ,num_states ,f eature_set) .mu( : , i)  =  ... 

mean (temp ( : , (xx(i) :xx(i+l) ) ) ,2) ; 
init_hmm( target ,num_states ,f eature_set) . sigma( : , : , i)  = 
diag(std(temp( : , (xx(i) :xx(i+l) ) ) ,0 ,2) ) ; 

end 

end  °/0  end  feature  set  loop 
end  °/0  end  target  loop 

°/0  train  the  hmms,  one  per  target  per  feature  set 
for  target  =  1 :  length  (settings  .targets)  °/0  num  targets 

for  feature_set  =  1 :  length  (settings  .  f  eature_sets)  °/0  num  feature  sets 

[LL,  train_hmm (target ,num_states ,f eature_set) .prior, . . . 
train_hmm (target ,num_states ,f eature_set) .trans, . . . 
train_hmm (target ,num_states ,f eature_set) .mu, . . . 
train_hmm (target ,num_states ,f eature_set) .sigma, . . . 
mu_hi story]  =  ... 

Iearn_ghmm(num2cell (train_records (target ,feature_set) .data, [1  2] ) , 
in  it_hinm  (target  ,num_states ,  f  eature_set)  .prior,  .  .  . 
in it_hmm (target ,num_states , f eature_set) .trans, . . . 
init_hmm (target ,num_states , f eature_set) .mu, . . . 
init_hmm (target ,num_states , f eature_set) .sigma, . . . 

^max.iter^ , 10, ^thresh^ , le-5, ^verbose^ ,0, ,cov_type; , JdiagJ ) ; 

end  °/0  end  feature  set  loop 
end  °/0  end  target  loop 

save(  [pwd,  ^Xparam^]  ,  ,init_h^lm,  ,  ,train_h^lm,  ,  ^-appendO 
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B.4  build-train  A  NN.m 

function  build_trainANN(num_states ,  seq_length) 

°/0  Tim  Albrecht 
°/o  AFIT/ENS 
°/0  Sep  2005 

°/0  ’build.trainPNN’  takes  sequence  length  as  input  and  builds  the 
°/0  training  seq  to  be  used  to  train  the  neural  network  fuser  in  the 
°/0  classification  scheme  also  uses  num_states  to  determine  initial  state 
°/0  for  testing  sequences  (prior  aspect  knowledge  case) 

°/0  load  settings 

load( [pwd, ’ \data’] , ’ settings ’ ) 

°/0  settings  .  feature_sets  =  {’HH’,’VV’}; 

°/0  settings .  feature_num  =  [10  10]; 

°/0  settings  .targets  =  {’target.!’  ,  ’target_2’  ,  ’target_5’  ,  ’target.10’  ,  .  .  . 

°/0  ’target_13’  ,  ’target_6’  ,  ’target_7’  ,  ’target.ll’  ,  .  .  . 

°/0  ’target_12’ , ’target_15’}; 

train_records_ann  =  struct (’ record’ ,[]) ; 

°/0  train_records_ann(target , f eature.set , record#)  .record  = 

°/0  [feature.dim  obs]  ; 

°/0  create  data  structure  to  store  test  records 
train.index  =  struct ( ’prior ’,[])  ; 

°/0  train.index (target ,  state ,  seq.length, record)  .prior  =  column  vector 
°/0  containing  initial  state  distribution  of  each  test  record,  revised  to 
°/0  represent  aspect  knowledge  to  within  +/-  22.5  deg  independent  of  num  of 
°/0  hidden  states 

°/0  generate  random  starting  aspect  angles;  create  200  start  points  per 
°/0  target  type; 

for  target  =  1 :  length  (settings  .targets)  °/0  num  targets 

num.exemplar  =  1 ; °/0size  (data, 2)  ;  multi  exemplar  case 

°/0  pick  random  start  points  for  ann  training  set 

index  =  randperm(360) ; 

ann.index  =  sort  (index (1 : 100)  );  °/0200)  )  ; 

°/0  insert  code  to  consider  prior  target  azimuth  knowledge 

ang.cov  =  360/num_states ;  °/0  observation  wedge  covered  by  single  state 

ang_cov_half  =  ang_cov/2;  °/0  half-width 

state.cov  =  round (22 . 5/ang_cov_half)  ;  °/0  num  states  needed  to  cover  45  deg 
state.prob  =  l/state_cov;  °/0  uniform  prior  over  number  of  reqd  states 

°/0  builds  a  uniform  distribution  across  the  appropriate  number  of  hidden 
°/0  states  within  the  prior  dist  vector 
for  i  =  1 : size (ann_index,2) 
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temp  =  zeros (mim_states , 1) ; 

est_state  =  floor ( (ann_index(i) -1) /ang_cov)+l ; 
if  rem(state_cov,2)  ~=  0  °/0odd  num  states 

temp(est_state)  =  state_prob;  °/0  mid-point 
for  j  =  1 : (state_cov-l) /2 
°/0  lower  half 
if  est_state  -  j  <=  0 

temp_index  =  mod(est_state  -  j  +  num_states ,num_ states  +  1) ; 

else 

temp_index  =  est_state  -  j ; 

end 

temp(temp_index)  =  state_prob; 

°/0  upper  half 

if  est_state  +  j  >  num_states 

temp_index  =  mod(est_state  +  j ,num_states) ; 

else 

temp_index  =  est_state  +  j ; 

end 

temp(temp_index)  =  state_prob; 

end 

else  °/0  even  num  states 
for  j  =  1 : state_cov/2 
°/0  lower  mid-points 
if  est_state  -  j  +  1  <=  0 
temp_index  =  ... 

mod(est_state-j+l+num_states ,num_states+l) ; 

else 

temp_index  =  est_state  -  j  +  1; 

end 

temp(temp_index)  =  state_prob; 

°/0  upper  mid-points 
if  est_state  +  j  >  num_states 

temp_index  =  mod(est_state  +  j ,num_states) ; 

else 

temp_index  =  est_state  +  j ; 

end 

temp(temp_index)  =  state_prob; 

end 

end 

train_index (target ,num_states , seq_length, i) .prior  =  temp; 

end 

°/0  build  test  sequences  by  target  type  (ann  train  set) 

for  i  =  1 :  size (ann_index,2)  °/0  num  test  records  per  target  type 

°/0  for  each  record,  must  choose  from  among  exemplars  of  target  class 
tar_exemplar  =  randperm(num_exemplar) ; 
tar_exemplar  =  tar_exemplar (1) ; 
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for  feature_set  =  1 :  length  (settings  .  f  eature_sets)  °/0  mim  feature  sets 

°/0  load  the  appropriate  feature  set/target  data  into  structure 
°/0  called  ’data* 

load_str  =  [pwd, ; \trial8_data\ ; J train_ ; , . . . 

settings . f eature_sets{f eature_set}]  ; 
target_str  =  [settings . targets{target} , ; ; . 

settings . f eature_sets{f eature_set}]  ; 
data  =  load(load_str ,target_str) ; 

eval([Jdata  =  data. ; , settings . targets{target} , ; ,  .. . 

settings . f eature_sets{f eature_set} , ; J  ] ) 

°/0  data  =  data(tar_exemplar)  ; 

°/0  pull  full  feature  data  into  temp  structure 
temp  =  data. feature ; 

temp  =  [temp  temp]  ;  °/0  account  for  possibility  of  wrapping  around 
°/0  from  360  degrees  back  to  1 

°/0  crop  to  seq_length 

temp2  =  temp( : ,ann_ index (i) : (ann_index(i)  +  seq_length-l) ) ; 

train_records_ann(target ,feature_set , i) .record  =  temp2; 
end  °/0  end  feature  set  loop 
end  °/0  end  ann  train  sequence  loop 
end  °/0  end  target  type  loop 

save  ( [pwd, ;\data;] , } train_records_ann; , ; train_index; , ; -append ; ) 
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B.5  trairi-ANN.m 


function  train_ANN(num_ states ,  seq_length) 

°/0  Tim  Albrecht 
°/o  AFIT/ENS 
°/0  Sep  2005 

°/0  ’train_ANN’  feeds  training  records  to  the  trained  HMMs,  produces 

°/0  log-likelihoods,  converts  to  posterior  probs,  uses  these  posteriors  to 

°/0  train  an  ANN  fuser,  and  saves  the  trained  ANN  fusers. 

°/0  load  the  trained  HMM  parameters  and  ANN  networks 

load  (  [pwd, ’\param’] , ’train_hmm’ , ’ init_hmm’ , ’ann’ , ’ann_data’ ) 

°/0  ’train_hmm (target ,num_states ,f eature_set)  .prior  =  []  ’ 

°/0  ’ apnn(num_ states , seq_length)  .net* 

°/0  ’ ann_data (target ,feature_set , seq_length) . log_lik  =  [record  targetHMM] 

°/0  .post  =  [record  targetHMM]  ’ 


°/0  load  the  training  sequences 

load  ( [pwd, ’\data’] , ’ train_records_ann’ , ’ train_index’ , ’settings*) 

°/0  * train_records_ann (target ,  feature_set ,  record#)  .  record  = 

°/0  [feature_dim  observation]  * 

°/0  ’ train_index (target , state , seq_length, record)  .prior  =  column  vector’ 

°/0  settings  .  feature_sets  =  {’HH’,’VV’}; 

°/0  settings .  feature_num  =  [10  10]; 

°/0  settings  .  targets  =  {’target_l’ , ’target _2’ , ’target_5’ , ’target_10’ ,..  . 

°/0  ’target_13’  ,  ’target_6’  ,  ’target_7’  ,  ’target_ll’  ,  .  .  . 

°/0  ’target_12’ , ’target_15’}; 

°/0  send  training  records  to  trained  HMMs;  produce  log-likelihoods 
for  target  =  1 :  length  (settings  .targets)  °/0  class  ’target’  data 

for  j  =  1 :  length  (settings .  targets)  °/0  against  class  ’j’  trained  hmms 

for  feature_set  =  1 :  length  (settings  .  f  eature_sets)  °/0  num  feature  sets 

ghmm.ll  =  []  ; 

for  k  =  1 :  size(train_records_pnn,3)  °/0  num  test  records 

°/0  insert  code  to  force  prior  aspect  knowledge 
prior  =  ... 

train_index (target , num_states , seq_length , k) . prior ; 

°/0  for  no  knowledge  use 

°/0  prior  =  normalise  (ones  (num_states ,  1)  )  ; 

ghmm_ll(k)  =  ... 

log_lik_ghmm(train_records_ann (target , f eature_set ,k) .record, . . . 
prior,  . . . 
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train_hmm( j ,num_states ,  f eature_set) .  trans,  . . . 
train_hmm( j ,num_states ,f eature_set) .mu, . . . 
train_hmm( j ,num_states ,f eature_set) . sigma) ; 

end  °/0  test  record  loop 

ghmm_ll  =  gbrnm.!!^ 

ann_data (target ,feature_set ,seq_length) .log_lik( : , j)  =  ghmm_ll; 
end  °/0  end  feature  set  loop 
end  °/0  end  against  j  trained  hmm  loop 
end  °/0  end  data  target  type  loop 

°/0  add  truth  to  log-liks,  build  training  set  for  ANN 
data  =  []  ; 

for  feature_set  =  1 :  size(ann_data,2)  °/0  num  feature  sets 
temp_holder  =  []  ; 

for  target  =  1 :  size  (ann_data,  1)  °/0  num  targets 
temp_holder  =  ... 

[temp_holder  ann_data (target ,feature_set , seq_length) . log_lik;] ; 

end 

°/0  data  ends  up  having  (num_targets) *num_f eaturesets  rows  and 
°/0  num_targets*num_re cords  columns 
data  =  [data;  temp_holder] ; 
end  °/0  end  feature  set  loop 

°/0  preprocess  data  to  -1  1  range,  save  min/max  for  preprocessing  testing 
°/0  data 

[data  min_d  max_d]  =  premnmx(data) ; 
num_records  =  size (train_records_ann, 3) ; 

truth  =  zeros (size (ann_data, 1) ,  size (ann_data, 1) *num_records) ; 
for  i  =  1 : size (ann_data, 1) 

truth(i , ( (i-1) *num_records+l) : (i*num_records) )  =  ones (1 ,num_re cords) ; 

end 

°/o  build  FFMLP  net 

net  =  newf f (  [-l*ones (20 , 1)  ones(20,l)],  [40  10],... 

{'tansig' , } logsig^ }) ; 

°/0  set  parameters 
net.trainFcn  =  ; traingdx* ; 
net . trainParam. epochs  =  3000; 
net .trainParam. show  =  NaN; 
net .trainParam. goal  =  .00001; 

°/0  train  net 

[net  tr]  =  train (net , data, truth) ; 
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°/0  save  net 

ann(num_ states , seq_length) .net  =  net; 

°/0  save  trained  ANN  fuser 

save ( [pwd , ; \param ;  * ann * , J min_d ; ; max_d ’ ,  ’ ann_data ; ; -append * ) 
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B.6  build Aest.m 


function  build_test (num_states ,  seq_length) 

°/0  Tim  Albrecht 
°/o  AFIT/ENS 
°/0  Sep  2005 

°/0  ,build_test;  takes  sequence  length  as  input  and  builds  the 

°/0  testing  data  set  to  be  used  in  the  classification  experiment 

°/0  also  uses  num_states  to  determine  initial  state  for  testing  sequences 

°/0  (prior  aspect  knowledge  case) 

°/0  load  settings 

load( [pwd, ; \data;] , ; settings } ) 

°/0  settings  .  feature_sets  = 

°/0  settings .  feature_num  =  [10  10]; 

°/0  settings  .targets  =  {’target.!’ ,  ,target_2;  , ’target.S’ , ’target.^  ,..  . 

°/0  ’target_13’  ,  ’target_6’  ,  ’target_7’  ,  ’target.ll’  ,  .  .  . 

°/0  ;target_12;  ,  ’target.lB’}; 

°/0  create  data  structure  to  store  test  records 
test_records  =  struct ( ; record^ ,[]) ; 

°/0  test_records  (target  ,feature_set,  record)  .record  =  [feature_dim  observation]; 
test_index  =  struct ( ; prior ; ,[]) ; 

°/0  test_index(target , state , seq_length, record)  .prior  =  column  vector 
°/0  containing  prior  distribution  using  prior  aspect  knowledge 

°/0  generate  random  starting  aspect  angles;  create  100  start  points  per 
°/0  target  type; 

for  target  =  1 :  length  (settings  .targets)  °/0  num  targets 

num_exemplar  =  1 ; °/0size  (data, 2)  ;  multiple  exemplar  case 

°/0  pick  random  start  points  for  hmm_test  set  and  pnn_train  set 
index  =  randperm(360) ; 
hmm_test  =  sort ( index (1 : 100) ) ; 

°/0  insert  code  to  consider  prior  target  azimuth  knowledge 

ang_cov  =  360/num_states ;  °/0  aspect  angle  wedge  covered  by  single  state 

ang_cov_half  =  ang_cov/2;  °/0  half-width 

state_cov  =  round (22 . 5/ang_cov_half)  ;  °/0  num  states  needed  to  cover  45  deg 
state_prob  =  l/state_cov;  °/0  uniform  prior  over  number  of  reqd  states 

°/0  builds  a  uniform  distribution  across  the  appropriate  number  of  hidden 
°/0  states  within  the  prior  dist  vector 
for  i  =  1 : size (hmm_test ,2) 

temp  =  zeros (num_states , 1) ; 

est_state  =  floor ( (hmm_test (i)-l)/ang_cov)+l ; 
if  rem(state_cov, 2)  ~=  0  °/0odd  num  states 

temp(est_state)  =  state_prob;  °/0  mid-point 
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for  j  =  1 : (state_cov-l) /2 
°/0  lower  half 
if  est_state  -  j  <=  0 

temp_index  =  mod(est_state  -  j  +  num_states ,num_ states  +  1) ; 

else 

temp_index  =  est_state  -  j ; 

end 

temp (temp_ index)  =  state_prob; 


°/0  upper  half 

if  est_state  +  j  >  num_states 

temp_index  =  mod(est_state  +  j ,num_states) ; 

else 

temp_index  =  est_state  +  j ; 

end 

temp (temp_ index)  =  state_prob; 

end 

else  °/0  even  num  states 
for  j  =  1 : state_cov/2 
°/0  lower  mid-points 
if  est_state  -  j  +  1  <=  0 
temp_ index  =  ... 

mod(est_state-j+l+num_states ,num_states+l) ; 

else 

temp_index  =  est_state  -  j  +  1; 

end 

temp(temp_index)  =  state_prob; 


°/0  upper  mid-points 
if  est_state  +  j  >  num_states 

temp_index  =  mod(est_state  +  j ,num_states) ; 

else 

temp_index  =  est_state  +  j ; 

end 

temp(temp_index)  =  state_prob; 

end 

end 

test_index (target ,num_states , seq_length, i) .prior  =  temp; 

end 


°/0  build  test  sequences  by  target  type  (hmm  test  set) 

for  i  =  1 :  size  (hmm_test  ,2)  °/0  num  test  records  per  target  type 

°/0  for  each  record,  must  choose  from  among  exemplars  of  target  class 
tar_exemplar  =  randperm(num_exemplar) ; 
tar_exemplar  =  tar_exemplar (1) ; 

for  feature_set  =  1 :  length  (settings  .  f  eature_sets)  °/0  num  feature  sets 

°/0  load  the  appropriate  feature  set/target  data  into  structure 
°/0  called  ’data’ 
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load_str  =  [pwd, ’ \trial8_dataV , ’  test_’ , . . . 

settings . f eature_sets{f eature_set}]  ; 
target_str  =  [settings . targets{target} , ’ _ ’  ,  .. . 

settings . f eature_sets{f eature_set}]  ; 
data  =  load(load_str , target_str) ; 

eval([’data  =  data. ’ , settings . targets{target} , ’ ,  .. . 

settings . f eature_sets{f eature_set} , ’ ; ’  ] ) 

°/0  data  =  data(tar_exemplar)  ; 

°/0  pull  full  feature  data  into  temp  structure 
temp  =  data. feature ; 

temp  =  [temp  temp]  ;  °/0  account  for  possibility  of  wrapping  around 
°/0  from  360  degrees  back  to  1 

°/0  crop  to  seq_length 

temp2  =  temp( : ,hmm_test (i) : (hmm_test (i)  +  seq_length-l) ) ; 

test_records (target , feature_set , i) . record  =  temp2; 
end  °/0  end  feature  set  loop 
end  °/0  end  hmm  test  sequence  loop 
end  °/0  end  target  type  loop 

save  ( [pwd, ’\data’] , ’  test_records ’ , ’test_index’ , ’-append’) 
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B.  7  test-HMM.  m 


function  test_HMM(num_states ,  seq_length) 

°/0  Tim  Albrecht 
°/o  AFIT/ENS 
°/0  Sep  2005 

°/0  ’test.HMM’  evaluates  the  test  records 

°/0  load  the  trained  HMM  parameters 

load  ( [pwd , ’ \param ’  ]  ,  ’ train_hmm ’ , ’ init_hmm ’ ) 

°/0  ’train.hmm (target ,num_states ,f eature.set)  .prior  =  []  ’ 

°/0  load  the  test  sequences 

load  ( [pwd, ’\data’] , ’  test.records ’ , ’test.index’ , ’ settings’ ) 

°/0  ^  test_records  (target ,  feature_set ,  record#)  .  record  =  [feature_dim  obs]  ; 

°/0  ;  test_index  (target ,  state ,  seq_length)  .hmm  =  row  vector 
°/0  settings  .  feature_sets  = 

°/0  settings .  feature_num  =  [10  10]; 

°/0  settings  .targets  =  {’target.!’  ,  ,target_2;  ,  ’target.B’  ,  ^target.lO^  ,  .  .  . 

°/0  ’target_13’  ,  ’target_6’  ,  ’target_7’  ,  ’target.ll’  ,  .  .  . 

°/0  ’target_12’ , ’target_15’}; 

°/0  load  the  output  information 

load  (  [pwd, ’ \output ’ ] , ’ghmm.re suits ’ ) 

°/0  ’ghmm.re  suits  (target  ,num_states ,  seq_length,f  eature.set)  .  log.lik  =  ... 

°/0  [test  records,  hmm  class]  ’ 

for  target  =  1 :  length  (settings  .targets)  °/0  class  ’target’  data 

for  j  =  1 :  length  (settings .  targets)  °/0  against  class  ’j’  trained  hmms 

for  feature.set  =  1 :  length  (settings  .  f  eature.sets)  °/0  num  feature  sets 
ghmm.ll  =  []  ; 

for  k  =  1 :  size  (test.records ,  3)  °/0  num  test  records 

°/0  insert  code  to  use  prior  aspect  angle  information 
prior  =  ... 

test.index (target ,num_states , seq_length,k) .prior; 

°/0  for  no  knowledge  use 

°/0  prior  =  normalise  (ones  (num.states ,  1)  )  ; 

ghmm_ll(k)  =  ... 

log_lik_ghmm(test_records (target ,feature_set ,k) .record, . . . 
prior,  . . . 

train_hmm(j ,num_states ,f eature.set) .trans,  . . . 
train_hmm(j ,num_states ,f eature.set) .mu, . . . 
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train_hmm( j ,num_states ,  f eature_set) .sigma) ; 
end  °/0  test  record  loop 
ghmm_ll  =  ghmm_ll,; 

ghmm_results (target ,num_states , seq_length,f eature_set) . log_lik( :  ,  j ) 
ghmm_ll ; 

end  °/0  end  feature  set  loop 
end  °/0  end  against  j  trained  hmm  loop 
end  °/0  end  data  target  type  loop 

save  (  [pwd ,  ;  \output *  ]  ,  ;  ghmin_re suits ;  ,  *  -append ;  ) 


212 


B.8  test-ANN.m 


function  test_ANN(num_states ,  seq_length) 

°/0  Tim  Albrecht 
°/o  AFIT/ENS 
°/0  Sep  2005 

°/0  ’test.PNN’  feeds  log-likelihoods  produced  by  the  HMMs  when  given  test 
°/0  sequences,  feeds  the  log-liks  to  the  trained  ANN  fusers. 

°/0  load  the  trained  ANN  networks  and  preprocessing  values 
load  (  [pwd , ; \par am J ; ann ; ; min_d ; J max_d J ) 

°/0  ;  ann  (num_ states ,  seq_length)  .net’ 

°/0  load  settings 

load  ( [pwd, 3 \data* ], ; settings ; ) 

°/0  settings  .  feature_sets  =  {,HH,,,VV,1; 

°/0  settings .  feature_num  =  [10  10]; 

°/0  settings  .targets  =  {,target_l;  ,  ;target_2;  ,  ’target.B’  ,  ’target.lO’  ,  .  .  . 

°/0  ’target.lS’  ,  ;target_6;  ,  ,target_7;  ,  ,target_ll;  ,  .  .  . 

°/0  ,target_12;  ,  ’target.lB’}; 

°/0  load  the  output  of  the  HMM  classifiers 

load  (  [pwd, ; \output J ] , ’ghmm.results ; ; ann_post ; ) 

°/0  ’ghmm.re  suits  (target  ,num_states ,  seq_length,f  eature_set)  .  log_lik  =  ... 

°/0  [test  records,  label_class]  ; 

°/0  ;  ann_post  (num_states ,  seq_length)  .post  =  ... 

°/0  [records,  label_class]  ’ 

°/0  build  test  set  for  ANN 
data  =  []  ; 

for  feature_set  =  1 :  size  (ghmm_re suits  ,4)  °/0  num  of  feature  sets 
temp_holder  =  []  ; 

for  target  =  1 :  size  (ghmm_re suits ,  1)  °/0  num  targets 
temp_holder  =  ... 

[temp_holder  . . . 

ghmm_results (target ,num_st at es , seq_length,f eature_set) . log_lik;] 

end 

data  =  [data;  temp_holder] ; 
end  °/0  end  feature  set  loop 

°/0  transform  data  to  -1  1  range  using  min/max  parameters 
data  =  tramnmx(data,min_d,max_d)  ; 

temp  =  sim (ann (num_states , seq_length) .net , data) ; ; 

°/0  convert  from  (num_states ,  seq_length)  which  is  matrix  with 

°/0  num_rows=num_targets*num_records_per_target  and  num_cols=num_targets , 
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°/0  to  (data_class,  num_states,  seq_length)  which  is  matrix  with 
°/0  num_rows  =  num_records_per_target  and  num_cols  =  num_targets 
for  data_class  =  1 :  length  (settings  .  targets)  °/0  index  into  data  class 
m  =  .  .  . 

size (ghmm_re suits (data_class ,num_states , seq_length, 1) . log_lik, 1) ; 

ann_post (data_class ,num_states , seq_length) .post  =  ... 
temp( ( (data_class-l) *m  +  1) : (data_class*m) , : ) ; 

end 

°/0  save  ANN  fuser  output 

save ( [pwd , ; \output ; ; ann_post ; J -append , ) 
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B.  9  test-outlibrary.  m 

function  test_outlibrary (num_states , seq_length) 

°/0  Tim  Albrecht 
°/o  AFIT/ENS 
°/0  Sep  2005 

°/0  ’test.outlibrary’  performs  out  of  library  record  testing.  It  builds  a 
°/0  set  of  100  test  records  drawn  from  5  out  of  library  targets  (20  records 
°/0  each)  . 

°/0  load  settings 

load( [pwd, ; \data;] , ; settings } ) 

°/0  settings  .  feature_sets  =  {,HH,,,VV,1; 

°/0  settings .  feature_num  =  [10  10]; 

target_list  =  {’target.S’ , ’target_4; , ,target_8; , ’target.*^ , ;target_14; }; 


0  /  0 !  0 I  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0 !  0  /  0  /  0  /  0 1 0  /  0  / 

/o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o 

°/0  generate  test  sequences  from  out  of  library  targets  °/0 
0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  /  0  / 
/o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o 


°/0  create  data  structure  to  store  test  records 
ool_records  =  struct ( ; record J ,  [] ) ; 

°/0  ool_records  (target  ,feature_set , record)  .  record  =  [feature_dim  observation]; 
ool_index  =  struct ( ; prior ; ,[]) ; 

°/0  ool_index (target ,  state ,  seq_length, record)  .prior  =  column  vector 
°/0  containing  prior  distribution  using  prior  aspect  knowledge 

°/0  generate  random  starting  aspect  angles;  create  20  start  points  per 
°/0  target  type; 

for  target  =  1 : length(target_list)  °/0  num  targets 

°/0  pick  random  start  points 
index  =  randperm(360) ; 
ool_test  =  sort (index(l : 20) ) ; 

°/0  insert  code  to  consider  prior  target  azimuth  knowledge 

ang_cov  =  360/num_states ;  °/0  aspect  angle  wedge  covered  by  single  state 

ang_cov_half  =  ang_cov/2;  °/0  half-width 

state_cov  =  round (22 . 5/ang_cov_half )  ;  °/0  num  states  needed  to  cover  45  deg 
state_prob  =  l/state_cov;  °/0  uniform  prior  over  number  of  reqd  states 

°/0  builds  a  uniform  distribution  across  the  appropriate  number  of  hidden 
°/0  states  within  the  prior  dist  vector 
for  i  =  1 : size (ool_test ,2) 

temp  =  zeros (num_states , 1) ; 

est_state  =  floor ( (ool_test (i)-l)/ang_cov)+l ; 
if  rem(state_cov, 2)  ~=  0  °/0odd  num  states 

temp(est_state)  =  state_prob;  °/0  mid-point 
for  j  =  1 : (state_cov-l) /2 


215 


°/0  lower  half 
if  est_state  -  j  <=  0 
temp_index  =  ... 

mod (e st .state  -  j  +  num.states , num.states  +  1); 

else 

temp.index  =  est.state  -  j ; 

end 

temp (temp .index)  =  state.prob; 

°/0  upper  half 

if  est.state  +  j  >  num.states 

temp.index  =  mod(est_state  +  j , num.states) ; 

else 

temp.index  =  est.state  +  j ; 

end 

temp (temp.index)  =  state.prob; 

end 

else  °/0  even  num  states 
for  j  =  1 : state_cov/2 
°/0  lower  mid-points 
if  est.state  -  j  +  1  <=  0 
temp.index  =  ... 

mod(est_state-j+l+num_states ,num_states+l) ; 

else 

temp.index  =  est.state  -  j  +  1; 

end 

temp (temp.index)  =  state.prob; 

°/0  upper  mid-points 
if  est.state  +  j  >  num.states 

temp.index  =  mod(est_state  +  j , num.states) ; 

else 

temp.index  =  est.state  +  j ; 

end 

temp (temp.index)  =  state.prob; 

end 

end 

ool.index (target , num.states , seq.length, i) .prior  =  temp; 

end 

°/0  build  test  sequences  by  target  type 

for  i  =  1 :  size  (ool.test  ,2)  °/0  num  test  records  per  target  type 

for  feature.set  =  1 :  length  (settings  .  f  eature.sets)  °/0  num  feature  sets 

°/0  load  the  appropriate  feature  set/target  data  into  structure 
°/0  called  ’data’ 

load.str  =  [pwd, ; \trial8_data\ ; , , ool.test. ; ,.. . 

settings . f eature_sets{f eature.set}]  ; 
target.str  =  [target.list {target} , . 
settings . f eature_sets{f eature.set}]  ; 


216 


data  =  load(load_str , target_str) ; 
eval([Jdata  =  data. ; , target _list {target}, ,  .. . 
settings . feature_sets{feature_set}, ; ; J]  ) 

°/0  pull  full  feature  data  into  temp  structure 
temp  =  data. feature ; 

temp  =  [temp  temp]  ;  °/0  account  for  possibility  of  wrapping  around 
°/0  from  360  degrees  back  to  1 

°/0  crop  to  seq_length 

temp2  =  temp( : ,ool_test (i) : (ool_test(i)  +  seq_length-l) ) ; 

ool_records (target ,feature_set , i) . record  =  temp2; 
end  °/0  end  feature  set  loop 
end  °/0  end  ool  test  sequence  loop 
end  °/0  end  target  type  loop 

save  ( [pwd, * \data;] , ; ool_records ; , ,ool_index; , ; -append ; ) 


0/  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  0/  0  f  0  J  0/  0/  0/  0/  0/  0/  0/  0/  0  J  0 J  0  f  0 I  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/ 

/o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o 

°/0  test  out  of  library  sequences  against  in-library  trained  HMMs  °/0 

0  /  0  J  0/  0  /  0  /  0  J  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0/  0  j  0 J  0  f  0  f  0/  0/  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of  Of 

/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o 


°/0  load  the  trained  HMM  parameters 

load  ( [pwd , ; \param ’ ] , ; train_hmm ; J init_hmm ; ) 

°/0  ^train.hmm (target ,num_states ,f eature_set)  .prior  =  []  ’ 

°/0  load  the  test  sequences 

load  (  [pwd, Adata^] , ; ool_records ; , ,ool_index; , ; settings > ) 

°/0  ;ool_records (target ,feature_set, record#)  .record  =  [feature_dim  obs]  ; 

°/0  ; ool_index (target , state , seq_length)  .prior  =  row  vector 
°/0  settings  .  feature_sets  = 

°/0  settings .  feature_num  =  [10  10]; 

target_list  =  {^target.S^ , ,target_4; , ,target_8; , ’target.^ , ,target_14, }; 

°/0  load  the  output  information 
load  (  [pwd , ; \output ’ ] , ; ool_results ; ) 

°/0  ;  ool_results  (target  ,num_states ,  seq_length,  feature_set)  .  log_lik  =  ... 

°/0  [test  records,  hmm  class]  ; 

for  target  =  1 :  length(target_list)  °/0  class  ;ool  target ;  data 

for  j  =  1 :  length  (settings  .targets)  °/0  against  class  ’j’  trained  hmms 

for  feature_set  =  1 :  length  (settings  .  f  eature_sets)  °/0  num  feature  sets 
ghmm_ll  =  []  ; 

for  k  =  1 :  size  (ool_records ,  3)  °/0  num  test  records 

°/0  insert  code  to  use  prior  aspect  angle  information 
prior  =  ... 
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ool_index (target , num_states , seq_length , k) . prior ; 

°/0  for  no  knowledge  use 

°/0  prior  =  normalise (ones (num_states , 1) ) ; 

ghmm_ll(k)  =  ... 

log_lik_ghmm(ool_records (target , f eature_set ,k) . record, 
prior,  . . . 

train_hmm(j ,num_states ,f eature_set) .trans,  . . . 
train_hmm(j ,num_states ,f eature_set) .mu, . . . 
train_hmm(j ,num_states ,f eature_set) .sigma) ; 

end  °/0  test  record  loop 

ghmm_ll  =  ghmm_ll J ; 

ool_results (target ,num_states , seq_length, feature_set) .log_lik(: ,j)  =  ... 
ghmm_ll ; 

end  °/0  end  feature  set  loop 
end  °/0  end  against  j  trained  hmm  loop 
end  °/0  end  data  target  type  loop 

save ( [pwd, ; \output *  ]  ,  ; ool_results ; , 3 -append J ) 

°/0  fuse  with  trained  ANN  networks 

°/0  load  the  trained  ANN  networks 

load  (  [pwd , 3 \par am *  ]  ,  ; ann ; ; min_d J ; max_d ; ) 

°/0  ;  ann  (num_ states ,  seq_length)  .net; 

°/0  load  fused  ann  posteriors 
load ( [pwd , ; \output ; ; ann_post ; ) 

°/0  build  test  set  for  ANN 
data  =  []  ; 

for  feature_set  =  1 :  size  (ool_results  ,4)  °/0  num  of  feature  sets 
temp_holder  =  []  ; 

for  target  =  1 :  size  (ool_results ,  1)  °/0  num  targets 
temp_holder  =  ... 

[temp_holder  . . . 

ool_results (target ,num_st at es , seq_length,f eature_set) . log_lik;] 

end 

data  =  [data;  temp_holder] ; 
end  °/0  end  feature  set  loop 

°/0  transform  data  to  -1  1  range  using  min/max  parameters 
data  =  tramnmx(data,min_d,max_d)  ; 

temp  =  ... 

s im ( ann (num_ states , seq_length) .net, data) ; ; 
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°/0  convert  from  (num_states , seq_length)  which  is  matrix  with 
°/0  num_r ows=num_t ar get s *num_r ec or ds_per .target  and  mim_cols=mim_targets , 
°/0  to  (data.class,  num.states,  seq_length)  which  is  matrix  with 
°/0  num_rows  =  num_records_per_target  and  num_cols  =  num_targets 
for  data_class  =  1 : length(target_list)  °/0  index  into  data  class 
m  =  ... 

size(ool_results(data_class,num_states,seq_length, 1) . log.lik, 1) ; 

ann_post (data_class ,num_states , seq_length) .ool_post  =  ... 
temp( ( (data_class-l) *m  +  1) : (data_class*m) , : ) ; 

end 

save ( [pwd , ; \output ; ; ann_post ; J -append , ) 
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